Session 1: C Pitfalls
Caution
You don’t have the authorization to feed any of this content to any online LLM for any purpose. If for some reason, you need to interact with a LLM, you may use an open-source model on your local machine to feed the course content. See llama.cpp to install a CPU efficient LLM inference and use your own computer to ask your questions.
In this first practical session, we’ll learn what a memory-safe programming language is. And the best way to understand memory-safety is to first have a look at memory-unsafety.
Is my RAM Unsafe?
Let’s open Godbolt’s compiler explorer, write a little C function that returns a character, then look at the produced x86 assembly code:

Evidently, trying to access the 400th item of a 3 items array won’t work. However, the code happily compiles! Look at the assembly code, instruction by instruction:
subl: We reserve 16 bytes on the stack by decrementing the ESP register (i.e., the stack pointer).movzbl: We read a byte at address ESP + 413, then put it into the EAX register (at this point, we just need to know that the first byte of thelistarray is located at ESP + 13).addl: We free 16 bytes from the stack.ret: We finally jump back to the caller. The 32 bits return address is read then removed from the stack, all in one instruction. The caller expects the return value to be written in EAX, so that’s OK.
The movzbl instruction here shows us that C is a memory-unsafe programming language. Indeed, the memory address is invalid but the compiled program will try to read it nonetheless.
Now, imagine that instead of a trivially hard-coded 400, we add a new index parameter. And suppose that, at some point, a malicious user can provide whatever value they want for it. Now you got yourself a memory-safety issue:
char get_char(int index) {
char list[3] = { 24, 75, 3 };
return list[index];
}
Given an invalid index, in the best case, the operating system will notice at runtime and immediately kill the program. In the worst case, the program will read data it is not supposed to. To ensure index is valid, we must manually and explicitely do something such as:
char get_char(int index) {
assert(index >= 0 && index <= 2); // Kills the program if not true.
char list[3] = { 24, 75, 3 };
return list[index];
}
The burden of memory-safety is on the programmers’ shoulders.
A memory-safe language such as Python, Java, Go, or Rust would have injected and executed the assertion automatically and implicitely, crashing for invalid indices. Keep in mind that this is a trivial example where list’s size is known beforehand, at compilation time. Memory-safe languages are able to handle cases where this is not even the case. They achieve this with runtime information. Basically, they keep the size of list in another variable.
Memory-safety doesn’t stop at array bounds checking. There are other issues that are usually taken care of using a mix of: virtual machine (Python, Java, WebAssembly), garbage collection (Python, Java, Go), runtime abstractions (modern C++, Rust), and/or compile-time rules and checks (Rust).
In this course, we have decided to make you use Rust. The aim of this programming language is to keep maximum performance (identical to C) with no garbage collector or virtual machine while still being memory-safe.
So, the goal of this first practical session is to introduce you to several types of bugs that are often encountered in the C programming language and that can lead to vulnerabilities. To do this, you will be asked to complete several exercises designed to help you detect bugs in C programs and debug them. We’ll also discover how the Rust programming language avoids these pitfalls.
So, let’s write C, but…
Isn’t C a Dead Language?
Well, for better or for worse, C is not dead. The whole world runs on it.
More specifically, let’s check our favorite kernel (or, if not your favorite yet, the one that makes servers, and Android, work):
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ scc linux
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C 36322 25771328 3716880 2853379 19201069 2521084
C Header 26355 10362694 774845 1551184 8036665 57824
Assembly 1360 381699 42568 50352 288779 3489
Rust 338 135822 10993 35002 89827 9261
...
───────────────────────────────────────────────────────────────────────────────
Total 86869 41257447 5188536 4660857 31408054 2622443
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $1 423 929 234
Estimated Schedule Effort (organic) 217,14 months
Estimated People Required (organic) 582,59
What about the most reliable media player?
$ git clone https://code.videolan.org/videolan/vlc.git
$ scc vlc
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C 1254 595881 85875 65071 444935 83892
C Header 852 136722 17191 51278 68253 3828
C++ 476 166643 23613 17074 125956 21161
C++ Header 431 46244 7711 12282 26251 583
Assembly 20 4850 449 435 3966 90
Rust 20 3221 344 594 2283 270
...
───────────────────────────────────────────────────────────────────────────────
Total 4513 1166304 165722 168953 831629 124108
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $31 442 815
Estimated Schedule Effort (organic) 50,99 months
Estimated People Required (organic) 54,79
Let’s not forget its internal library that can read whatever media file you throw at it:
$ git clone https://git.ffmpeg.org/ffmpeg.git
$ scc ffmpeg
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C 3338 1523389 189760 122465 1211164 216203
C Header 1187 247514 21001 66554 159959 2954
Assembly 400 182553 14770 13003 154780 1926
...
───────────────────────────────────────────────────────────────────────────────
Total 5230 2005313 233212 205006 1567095 224570
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $61 156 893
Estimated Schedule Effort (organic) 65,65 months
Estimated People Required (organic) 82,76
Do you know Python?
$ git clone https://github.com/python/cpython.git
$ scc cpython
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python 2217 1089091 114654 91597 882840 87172
C Header 637 356738 31076 18746 306916 20635
C 485 653964 64822 80599 508543 104795
...
───────────────────────────────────────────────────────────────────────────────
Total 4955 2904156 367555 197045 2339556 221030
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $93 150 554
Estimated Schedule Effort (organic) 77,04 months
Estimated People Required (organic) 107,42
What about databases?
$ git clone https://git.postgresql.org/git/postgresql.git
$ scc postgresql
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C 1584 1557272 187314 393892 976066 163484
C Header 989 200845 18181 65439 117225 2665
...
───────────────────────────────────────────────────────────────────────────────
Total 4579 2170000 261470 508166 1400364 173853
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $54 343 588
Estimated Schedule Effort (organic) 62,77 months
Estimated People Required (organic) 76,91
Machine learning?
$ git clone https://github.com/pytorch/pytorch
$ scc pytorch
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python 4296 2238376 245271 242866 1750239 169393
C Header 2264 394339 49378 61164 283797 20369
C++ 2152 849234 83222 64656 701356 78927
C 193 41985 4149 2662 35174 3631
C++ Header 67 12602 2031 1931 8640 506
Assembly 34 9603 1420 410 7773 25
...
───────────────────────────────────────────────────────────────────────────────
Total 11251 3964150 441933 395132 3127085 292013
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $126 325 782
Estimated Schedule Effort (organic) 86,49 months
Estimated People Required (organic) 129,76
And last, but not least, let’s have a look at the 60 GiB source code (don’t try this at home) of the base browser for Chrome, Edge, Brave, Opera, et al. (we don’t forget you either, Samsung Internet):
$ git clone https://chromium.googlesource.com/chromium/src
$ scc src
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
C++ 66418 21007579 2940888 1954168 16112523 1165580
C Header 58223 6637079 1186021 1454297 3996761 88573
Rust 5147 2225829 145434 332292 1748103 125898
C 1551 855738 110201 124513 621024 90471
C++ Header 195 19990 3714 2128 14148 406
...
───────────────────────────────────────────────────────────────────────────────
Total 359565 54543287 6579284 5691291 42272712 2085308
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $1 945 173 939
Estimated Schedule Effort (organic) 244,47 months
Estimated People Required (organic) 706,90

Without counting C++, that’s already 49,335,488 lines of C code (or 3,735,522,805 $ but please don’t trust that number). And the list goes on and on. Also, this is only free and open source software. Indeed, the Windows kernel itself is written in C, the same goes for macOS, iOS, a vending machine (probably), your PS5, etc. Heck, even your car probably makes HTTP requests using C. New C code keeps getting written and old C code keeps getting fixed everywhere. We have to deal with it. That makes C a relevant language for developers.
Today, C is still one of the most widespread, performant, and close-to-hardware language that one could use. It is simple to learn but hard to master. It is the perfect way to understand how a computer and its operating system work. And it inspired so many languages that came after it. That makes C a foundational language for computer science.
Finally, it’s easy to “shoot yourself in the foot” using C. Indeed, security is far from built-in because it wasn’t even a concern at the time C was invented (1972). And with all this written-and-running software, many developers actually shot themeselves in the feet. That makes C a critical language for cybersecurity. So let’s learn from its past mistakes and try solving them.