Artificial Intelligence

November 21, 2023 · One min read

General

General methods are almost always better than imposing human knowledge on the system (e.g. Alpha Go Zero, Image Processing with edges vs deep learning). The only case where human knowledge might help a bit is when you have a product that is not working without a lot of data and that data is only obtainable during operation, for example self driving cars. Then, it might be helpful to develop a lower capability system with the help of human knowledge to have a sellable system and get the data to train the system from scratch without human knowledge (e.g. Tesla FSD)

Bayesian thinking

November 21, 2023 · 0 min read

classes

November 21, 2023 · One min read

erledigt: 25: alle kernmodule hab ich (dig übertragung und dig signalverarbeitung noch nicht bestanden, hochfrequenztechnik noch aus bachelor)

25: conv opt, antennen, kanalcodierung, mobile communications, global nav

2.5: wahl aus angebot von fau (einführung in ML)

2.5: hauptseminar aus studienrichtung gemacht (audio processing)

laufend: 2.5 laborpraktikum aus studienrichtung (signal processing ML)

fehlen: Vorlesungen: 5 vertiefung (hochfrequenztechnik?) 12.5 wahl aus fau

Praktika und seminare: hauptseminar aus fau 2.5 laborpraktikum aus angebot von techfak 2.5 hauptseminar aus angebot von fau 10 forschungspraktikum

30 fehlen an masterarbeit

total: erledigt: 55 laufend: 2.5 brauch noch: 17.5 (vorlesungen) 15 (praktika/sem)

30 (masterarbeit)

120

Coding

November 21, 2023 · One min read

Gitignore

Just select a gitignore template when creating a repository on github. Overview can be found here.

Python random seed

Need to set the seed in the file of the execution of the function. If the function is imported from another file, the seed will not be set for the imported function. Same for PyTorch.

Vim tips

gf to open file under cursor (markdown internal links)
gx to open file under cursor with default program (images, urls)

Local server

If you get slow load times on every other request only in chrome, use 127.0.0.1 instead of localhost. I think it has something to do with chrome trying to resolve localhost to ipv6 first.

Cool python libraries

icecream for nicer printing/logging

Python flask sqlite

When accessing database, the fetchall() function returns a list of sql row objects. When accessing a specific column from one row, in python you need to use bracket notation [string]. In the jinja template you can use the dot notation .string (without quotation marks, like accessing an attribute of an object) or the bracket notation.

CS50

November 21, 2023 · 8 min read

Lecture 0

Lecture 1

Compiling

make <program_name>

With file ending.

Make can be used to compile most files. Then one can execute them without file ending.

./<program_name>

Basic C syntax

No need to go over this.

Lecture 2

Some more compiling

Make is essentially calling the language specific compilers. For C clang is called. With clang includes need to be manually added as an argument. Make does this automatically.

Compiling is generally done in multiple steps:

Preprocessing:
- adding in includes and macros
- removing comments
Compiling
- converts code to assembly
Assembling
- converts assembly to binary, which is machine code and can be run on a CPU.
Linking
- puts compiled includes in the binary code. No need to compile includes multiple times.

Debugging

Bugs are errors in a program, so that it performs differently than expected. Finding and fixing these errors is called Debugging.

Lecture 3

Search

Arrays are just lists of entries. Computers can only look at one entry at a time, so search algorithms are needed to look up specific entries.

Big O

Most search algorithms try to achieve the same thing. The main difference is the running time. This is not exactly in seconds but as the complexity of the algorithm.

To do that one uses the big O notation, which describes how much time the algorithm takes approximately dependent on the size of the problem. The most common running times are:

\(O(n^2)\)
\(O(n \log n)\)
\(O(n)\)
\(O(\log n)\)
\(O(1)\)

The \(O\) describes the upper bound of time steps an algorithm takes. The lower bound is described by \(\Omega\), and if the two are the same one uses \(\Theta\).

Different search algorithms

Here.

Different sorting algorithms

Some Sort Algorithms.

Recursion

Recursion can be helpful to express logic, for example binary search. One needs to be careful when defining the breaking condition, so not too much memory is used by going too deep.

Lecture 4

Pointers are variables which store memory addresses where the values of other variables might be stored. It's important to know the difference, so not to copy the address and think one copied the value of the variable.
The Syntax for arrays just uses the address of the first element and adds the indices of the successive elements to that address. Same is happening with strings. Strings are just one pointer to the first character. The computer looks at the successive addresses and stops at the element \0.
It's important to know that when accessing uninitialized memory one can see values that have been saved by previous programs at that address. This can be dangerous, if there are passwords saved for example.
One can check memory errors with valgrind on the command line.
All this is important if one wants to make the program as efficient as possible and debug deep down. But for high level languages like python this is not as important as python mostly handles this for you, with less efficiency.

Lecture 5

Linked lists

Linked lists are list where the elements are not stored behind each other in memory but at separate places. After the first value one needs to store a pointer to the next value, and so on.
If one wants to add a element to a normal array and the memory slot after the originally last element is already full, we have an issue. We can either copy the array to a new location in memory with enough space, which requires some runtime, or we use a linked list where adding a new element is trivial, as the element after the last element is always reserved for a pointer to a potential new element.
So if we have a constant length list, use a normal array, as the linked list would require more memory. If we might change the length of the list, use a linked list, as the overhead is less than the potential cost of copying a normal array.

Trees

Trees are just defined by nodes. A node is a data structure which has one value and can have multiple pointers to child nodes.

Other data structures

There are several other data structures like queues (first-in-first-out), stacks (last-in-first-out) and dictionaries.

Lecture 6

Learning a new programming language

Most programming languages are pretty similar. All of them have conditions, operators, data structures and other things. Differences are often just syntax or bigger things like how they handle scopes of variables and types. All in all, if you have mastered one language it is pretty easy to learn another language up to a decent level.

Lecture 7

Data processing

When downloading datasets or even collecting them yourself, most of them are not cleaned. Which means, there might be typos, different names for the same thing, columns which should be multiple columns and much more ugliness. Python is well equipped to clean data, especially with the help of regular expressions. However for quick fixes or searches a database language like sqlite is probably easier. To combine both, one can execute sql commands from within python with the sqlite library.
When working with databases the most important thing is to escape user input to avoid injection attacks. When working with multiple servers and multiple users one should lock data that is currently changed by one server. Otherwise a second server might change the same data at the same time and one change gets lost or even worse unintended stuff happens.

Lecture 8

The internet

IP

The internet is basically just a big web of all the routers and in extension the computers/devices and servers of a lot of people in the world. One can send information to any other point in that web. To achieve that, one needs the internet protocol (IP) to tell the routers where to send the information.

TPC/UDP

TPC is another protocol that helps with sending information to different programs of one IP address. It also allows sending large chunks of data in multiple parts. If the user has a bad connection certain parts can be sent again instead of all the parts. UDP is a protocol that allows sending large amounts of data, but it doesn't grantee delivery. This is useful for calls or other real time applications as one doesn't want to wait for new data, just to resend earlier data to get the perfect result.

DNS

When you type in a address in your web browser the computer and then your router somehow needs to know what IP address corresponds to the web address. This is done with DNS servers, which have huge lists which save these correspondences. So your router always first contacts a DNS server and gets a IP address back, which then can be used to contact the correct server to get the information one wants.

Clientside

HTTP

Browsers use Hypertext Transfer Protocol (HTTP) to interface with TPC/IP packets. HTTPS ensures the packets that arrive at the browser are encrypted.

URL

A web address like https://www.google.com is also called a URL.

GET/POST

GET and POST requests can be used by browsers to request content from servers.
You can use curl on the command line to check the headers of the responses of servers to GET requests.

curl -I -X GET https://www.harvard.edu/

The status codes one gets back can then be interpreted and used to modify the request to get the correct response.

HTML

Hypertext Markup Language is used to tell the browser what and how to display information. It is however not a programming language.

CSS

To style HTML one can use Cascading Style Sheets (CSS) which isn't a programming language either.

JavaScript

To change elements and styling one can use JavaScript, which is a programming language. It will be executed on the device of the user.

Lecture 9

Web server programming

A framework like flask or django can be used to program a server with python to send responses to users. So when the user types in a certain URL or clicks on a link, the server sends data in terms of a HTML page, CSS and JavaScript back. This data can be dynamically generated with the full power of python. The python code then communicates on the server with a database, sometimes on another server.
This enables accounts and other things where the website needs to remember stuff about the user. Often times, for example for autocomplete it is helpful to use a mix of JavaScript and serverside code to accelerate the results, as responses from the server take some time compared to calculations on device.

Daily ToDos

November 21, 2023 · One min read

program autotuner for hyperparameter optimization
get basic blocks for good project structure going

Data Preparation and Feature Engineering

November 21, 2023 · 2 min read

SQL is probably most useful, when it just comes to data manipulation and query. Excel is easier because it is more "what you see is what you get" due to the UI. If you need to go beyond data manipulation into machine learning python is probably the best.
To learn SQL and use python for establishing a pipeline for machine learning, the best thing might be to use python to automate SQL commands. For quick stuff google sheets is probably good to learn.

Overview

Machine Learning generally tries to recognize patterns in data to then generate new data points. To achieve that, one needs to generate and transform a dataset to feed into the algorithms.

Mainly just notes taken from Google.

Dataset Generation

Dataset Transformation

When to transform

Prior to training

Pros

computation only performed once

Cons

Transformations need to be reproduced at prediction time. New data can be unpredictable.
need to rerun dataset generation when changing transformations, which may lead to slow iterations. Not an issue with a small dataset.

Within the model

Pros

can always use the same data, as happen in the model.
when changing transformations the same data is still used, which leads to fast iterations.

Cons

transformations can increase latency, this is the case with transformations at prediction time as well.

Visualizations

Always look at graphs or other visualizations of your dataset, before and after transformations to detect errors or irregularities.

Normalization

When having features with highly different ranges of numeric values it is recommended to perform normalization. Gradient decent can have issues and slowly converge otherwise.

Training Guide

November 21, 2023 · One min read

Expand on this.

Docker

November 21, 2023 · One min read

I wanted to just have a raw ubuntu install to test my dotfiles.

Create Dockerfile

FROM ubuntu:latest

Build image

docker build -t ubuntu .

-t creates a tag for this image, to reference it later.

Run image

docker run --name ubuntu -td ubuntu

--name gives the container a name, so you can reference it later.
-t allocates a pseudo-TTY, so when all processes defined in Dockerfile are finished, the container will not exit.
-d keeps the container running in the background.

Attach to container

docker exec -i -t ubuntu /bin/bash

-i interactive mode
-t allocate a pseudo-TTY
runs bash in the container and attaches to it. Uses name specified in --name.

Electric vehicles

November 21, 2023 · One min read

Ideally we would all be using public transport, bicyles and our legs. But humans want for various reasons personal vehicles. Currently we mainly use internal combustion engines (ICE) cars. Due to their emission of co2 and resulting human made climate change we need other solutions in the long term. Electric vehicles are the prime candidate for that position.

There are other possibilities like hydrogen fuel cell or biofuel engines. But these "solutions" might only be feasible in the future. But we need a solution right now and with battery electric vehicles (BEVs) we have everthing we need. The only thing BEVs lack behind other options is the range and the charging speed. But is not a concern for most people in everyday life because the can charge at home over night and almost never need to drive the maximum range in their daily life.

The other concern many people have is that BEVs are actually not more "green" than ICEs due to the high energy use when producing the battery and that buying a tesla is not environmentally friendly. There is a more difficult and an easy argument against this.

General​

30 (masterarbeit)​

Gitignore​

Python random seed​

Vim tips​

Local server​

Cool python libraries​

Python flask sqlite​

Compiling​

Basic C syntax​

Some more compiling​

Debugging​

Search​

Big O​

Different search algorithms​

Different sorting algorithms​

Recursion​

Linked lists​

Trees​

Other data structures​

Learning a new programming language​

Data processing​

The internet​

IP​

TPC/UDP​

DNS​

Clientside​

HTTP​

URL​

GET/POST​

HTML​

CSS​

JavaScript​

Web server programming​

Overview

Dataset Generation

Dataset Transformation

When to transform​

Prior to training​

Pros​

Cons​

Within the model​

Pros​

Cons​

Visualizations​

Normalization​

General

30 (masterarbeit)

Gitignore

Python random seed

Vim tips

Local server

Cool python libraries

Python flask sqlite

Compiling

Basic C syntax

Some more compiling

Debugging

Search

Big O

Different search algorithms

Different sorting algorithms

Recursion

Linked lists

Trees

Other data structures

Learning a new programming language

Data processing

The internet

IP

TPC/UDP

DNS

Clientside

HTTP

URL

GET/POST

HTML

CSS

JavaScript

Web server programming

When to transform

Prior to training

Pros

Cons

Within the model

Pros

Cons

Visualizations

Normalization