Debugging and Profiling #

Lecture source: https://missing.csail.mit.edu/2020/debugging-profiling/

Syscalls #

Use strace to analyze what syscalls a program may used.

$ sudo strace -e lstat ls -l / > /dev/null
lstat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("swapfile", {st_mode=S_IFREG|0600, st_size=1989529600, ...}) = 0
lstat("bin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("lib64", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("vmlinuz", {st_mode=S_IFLNK|0777, st_size=31, ...}) = 0
lstat("initrd.img", {st_mode=S_IFLNK|0777, st_size=34, ...}) = 0
lstat("home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("dev", {st_mode=S_IFDIR|0755, st_size=3800, ...}) = 0
lstat("root", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
lstat("initrd.img.old", {st_mode=S_IFLNK|0777, st_size=33, ...}) = 0
lstat("etc", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("run", {st_mode=S_IFDIR|0755, st_size=1020, ...}) = 0
lstat("usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("media", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("lost+found", {st_mode=S_IFDIR|0700, st_size=16384, ...}) = 0
lstat("opt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
lstat("sbin", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
lstat("snap", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("vmlinuz.old", {st_mode=S_IFLNK|0777, st_size=30, ...}) = 0
lstat("boot", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("mnt", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("srv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
+++ exited with 0 +++

Profilers #

Use line_profiler with kernprof to get the running time on every lines in the Python code.

You can also get this from Python’s cProfile.

Use memory-profiler to investigate the memory usage for each line of the Python code.

Use perf to do the event profiling, it has a lot of traceable events, dig it and use it!

Use FlameGraph to visualize the profiling data (from perf etc.)

If you are with Python, you can use the pycallgraph to visualize the Python funciton call graph with some of profiling information.

If you are only interest in code function call graph, give code2flow a try.

Resource Monitoring #

IMO, this is the critical part for all the profiling, because profiling is all about the resource, and the monitoring is the begining of the profiling.

General monitoring:

htop: an improved verseion of top.
sar: monitor CPU, mem, network I/O etc.

I/O:

iotop

Disk usage:

df: for per partitions.
du: for per file or directory.

Memory:

free

Open files (file descriptors):

lsof: list of open files.

Network connections statistics:

ss: use ss -s to show sockets usage summary.
lsof: sockets are files in Linux, use lsof -i :<port_num> to find the sockets.
ip: a tool to show / manipulate routing, network devices, interfaces and tunnels.

Network usage:

iftop
nethogs
sar: use sar -n DEV 1 to get the network I/O statistics.

+benchmark for cli tools: hyperfine

Specialized Tools #

Use hyperfine to benchmark the CLI (Command-Line Interface) tools.

Exercises #

Debugging #

Use journalctl on Linux or log show on macOS to get the super user accesses and commands in the last day. If there aren’t any you can execute some harmless commands such as sudo ls and check again.
```
$ journalctl -S "`date -d 'last day' '+%Y-%m-%d'`" | grep "root"
```
Do this hands on pdb tutorial to familiarize yourself with the commands. For a more in depth tutorial read this.

TODO

Install shellcheck and try checking the following script. What is wrong with the code? Fix it. Install a linter plugin in your editor so you can get your warnings automatically.

#!/bin/sh
## Example: a typical script with several problems
for f in $(ls *.m3u)
do
  grep -qi hq.*mp3 $f \
    && echo -e 'Playlist $f contains a HQ file in mp3 format'
done

(Advanced) Read about reversible debugging and get a simple example working using rr or RevPDB.

Profiling #

Here are some sorting algorithm implementations. Use cProfile and line_profiler to compare the runtime of insertion sort and quicksort. What is the bottleneck of each algorithm? Use then memory_profiler to check the memory consumption, why is insertion sort better? Check now the inplace version of quicksort. Challenge: Use perf to look at the cycle counts and cache hits and misses of each algorithm.
Here’s some (arguably convoluted) Python code for computing Fibonacci numbers using a function for each number.
```
#!/usr/bin/env python
def fib0(): return 0
   
def fib1(): return 1
   
s = """def fib{}(): return fib{}() + fib{}()"""
   
if __name__ == '__main__':
   
    for n in range(2, 10):
        exec(s.format(n, n-1, n-2))
    # from functools import lru_cache
    # for n in range(10):
    #     exec("fib{} = lru_cache(1)(fib{})".format(n, n))
    print(eval("fib9()"))
```
Put the code into a file and make it executable. Install prerequisites: pycallgraph and graphviz. (If you can run dot, you already have GraphViz.) Run the code as is with pycallgraph graphviz -- ./fib.py and check the pycallgraph.png file. How many times is fib0 called?. We can do better than that by memoizing the functions. Uncomment the commented lines and regenerate the images. How many times are we calling each fibN function now?
A common issue is that a port you want to listen on is already taken by another process. Let’s learn how to discover that process pid. First execute python -m http.server 4444 to start a minimal web server listening on port 4444. On a separate terminal run lsof | grep LISTEN to print all listening processes and ports. Find that process pid and terminate it by running kill <PID>.
Limiting processes resources can be another handy tool in your toolbox. Try running stress -c 3 and visualize the CPU consumption with htop. Now, execute taskset --cpu-list 0,2 stress -c 3 and visualize it. Is stress taking three CPUs? Why not? Read man taskset. Challenge: achieve the same using cgroups. Try limiting the memory consumption of stress -m.
(Advanced) The command curl ipinfo.io performs a HTTP request and fetches information about your public IP. Open Wireshark and try to sniff the request and reply packets that curl sent and received. (Hint: Use the http filter to just watch HTTP packets).