Compiling
With file ending.
Make can be used to compile most files. Then one can execute them without file ending.
Basic C syntax
No need to go over this.
Some more compiling
Make is essentially calling the language specific compilers. For C clang is called. With clang includes need to be manually added as an argument. Make does this automatically.
Compiling is generally done in multiple steps:
- Preprocessing:
- adding in includes and macros
- removing comments
- Compiling
- converts code to assembly
- Assembling
- converts assembly to binary, which is machine code and can be run on a CPU.
- Linking
- puts compiled includes in the binary code. No need to compile includes multiple times.
Debugging
Bugs are errors in a program, so that it performs differently than expected. Finding and fixing these errors is called Debugging.
Search
Arrays are just lists of entries. Computers can only look at one entry at a time, so search algorithms are needed to look up specific entries.
Big O
Most search algorithms try to achieve the same thing. The main difference is the running time. This is not exactly in seconds but as the complexity of the algorithm.
To do that one uses the big O notation, which describes how much time the algorithm takes approximately dependent on the size of the problem. The most common running times are:
- \(O(n^2)\)
- \(O(n \log n)\)
- \(O(n)\)
- \(O(\log n)\)
- \(O(1)\)
The \(O\) describes the upper bound of time steps an algorithm takes. The lower bound is described by \(\Omega\), and if the two are the same one uses \(\Theta\).
Different search algorithms
Here.
Different sorting algorithms
Some Sort Algorithms.
Recursion
Recursion can be helpful to express logic, for example binary search. One needs to be careful when defining the breaking condition, so not too much memory is used by going too deep.
Pointers are variables which store memory addresses where the values of other variables might be stored. It's important to know the difference, so not to copy the address and think one copied the value of the variable.
The Syntax for arrays just uses the address of the first element and adds the indices of the successive elements to that address. Same is happening with strings. Strings are just one pointer to the first character. The computer looks at the successive addresses and stops at the element \0
.
It's important to know that when accessing uninitialized memory one can see values that have been saved by previous programs at that address. This can be dangerous, if there are passwords saved for example.
One can check memory errors with valgrind on the command line.
All this is important if one wants to make the program as efficient as possible and debug deep down. But for high level languages like python this is not as important as python mostly handles this for you, with less efficiency.
Linked lists
Linked lists are list where the elements are not stored behind each other in memory but at separate places. After the first value one needs to store a pointer to the next value, and so on.
If one wants to add a element to a normal array and the memory slot after the originally last element is already full, we have an issue. We can either copy the array to a new location in memory with enough space, which requires some runtime, or we use a linked list where adding a new element is trivial, as the element after the last element is always reserved for a pointer to a potential new element.
So if we have a constant length list, use a normal array, as the linked list would require more memory. If we might change the length of the list, use a linked list, as the overhead is less than the potential cost of copying a normal array.
Trees
Trees are just defined by nodes. A node is a data structure which has one value and can have multiple pointers to child nodes.
Other data structures
There are several other data structures like queues (first-in-first-out), stacks (last-in-first-out) and dictionaries.
Learning a new programming language
Most programming languages are pretty similar. All of them have conditions, operators, data structures and other things. Differences are often just syntax or bigger things like how they handle scopes of variables and types. All in all, if you have mastered one language it is pretty easy to learn another language up to a decent level.
Data processing
When downloading datasets or even collecting them yourself, most of them are not cleaned. Which means, there might be typos, different names for the same thing, columns which should be multiple columns and much more ugliness. Python is well equipped to clean data, especially with the help of regular expressions. However for quick fixes or searches a database language like sqlite is probably easier. To combine both, one can execute sql commands from within python with the sqlite library.
When working with databases the most important thing is to escape user input to avoid injection attacks. When working with multiple servers and multiple users one should lock data that is currently changed by one server. Otherwise a second server might change the same data at the same time and one change gets lost or even worse unintended stuff happens.
The internet
The internet is basically just a big web of all the routers and in extension the computers/devices and servers of a lot of people in the world. One can send information to any other point in that web. To achieve that, one needs the internet protocol (IP) to tell the routers where to send the information.
TPC/UDP
TPC is another protocol that helps with sending information to different programs of one IP address. It also allows sending large chunks of data in multiple parts. If the user has a bad connection certain parts can be sent again instead of all the parts. UDP is a protocol that allows sending large amounts of data, but it doesn't grantee delivery. This is useful for calls or other real time applications as one doesn't want to wait for new data, just to resend earlier data to get the perfect result.
DNS
When you type in a address in your web browser the computer and then your router somehow needs to know what IP address corresponds to the web address. This is done with DNS servers, which have huge lists which save these correspondences. So your router always first contacts a DNS server and gets a IP address back, which then can be used to contact the correct server to get the information one wants.
Clientside
HTTP
Browsers use Hypertext Transfer Protocol (HTTP) to interface with TPC/IP packets. HTTPS ensures the packets that arrive at the browser are encrypted.
URL
A web address like https://www.google.com is also called a URL.
GET/POST
GET and POST requests can be used by browsers to request content from servers.
You can use curl
on the command line to check the headers of the responses of servers to GET requests.
curl -I -X GET https://www.harvard.edu/
The status codes one gets back can then be interpreted and used to modify the request to get the correct response.
HTML
Hypertext Markup Language is used to tell the browser what and how to display information. It is however not a programming language.
CSS
To style HTML one can use Cascading Style Sheets (CSS) which isn't a programming language either.
JavaScript
To change elements and styling one can use JavaScript, which is a programming language. It will be executed on the device of the user.
Web server programming
A framework like flask or django can be used to program a server with python to send responses to users. So when the user types in a certain URL or clicks on a link, the server sends data in terms of a HTML page, CSS and JavaScript back. This data can be dynamically generated with the full power of python. The python code then communicates on the server with a database, sometimes on another server.
This enables accounts and other things where the website needs to remember stuff about the user. Often times, for example for autocomplete it is helpful to use a mix of JavaScript and serverside code to accelerate the results, as responses from the server take some time compared to calculations on device.