Tracepoints and kernel modules

Tue 15 March 2022 | tags: Linux

While debugging, it’s invaluable to be able to trace functions in the Linux kernel. There are several tools for that task: perf, ftrace (with helper tool like trace-cmd), bpftrace, etc.

For my own debug of kernel internals and in the last few years to debug the i915 module for Intel graphics, I usually use tracepoints or dynamic tracepoints (with perf probe) as the trigger for something I want to trace.

One question I get some times is “how do I trace the initial probe of a kernel module?”. At that time you (usually) don’t yet have the module loaded and as a consequence, the symbols of that module are still not available for using tracepoints. For example:

# trace-cmd list -e '^i915.*'
#

One thing to realize is that a lot of time we don’t want to debug the module loading, but rather the module binding to a device. These are 2 separate things, that happen in a cascade of events. It probably also adds to the confusion that the userspace tools to load the modules are not named very consistently: modprobe and insmod, and they end up calling the [f]init_module syscall.

The various buses available in the kernel have mechanisms for stopping the autoprobe (and hence binding to a device), when a module is loaded. With i915, all we have to do is to set a few things in the /sys/bus/pci/:

# echo 0 > /sys/bus/pci/drivers_autoprobe
# modprobe i915

With the i915 module, but not attached to any device, we are ready to attach to any tracepoint or create new dynamic probes. Let’s suppose I want to debug the i915_driver_probe()() function, and any function called by it during the initialization. This is one of the functions to initialize the GPU in i915 called when we are binding to a device.

F=i915_driver_probe

echo 0 > /sys/bus/pci/drivers_autoprobe 
modprobe i915

cd /sys/kernel/debug/tracing
cat /dev/null >  trace
echo $F > set_graph_function
echo _printk > set_graph_notrace
echo 10 > max_graph_depth
echo function_graph > current_tracer
echo 1 > tracing_on

echo 0000:03:00.0 | tee /sys/bus/pci/drivers/i915/bind
cp trace /tmp/trace.txt

echo 0 > tracing_on
cat /dev/null >  trace

With the snippet above we will start tracing whenever function i915_driver_probe() is executed. We also set a few additional parameters: set the maximum call depth to graph, disable graphing on printk() since it’s usually not very interesting. Depending on the size of your trace you may also need to increase the buffer_size_kb to avoid the ring buffer to rotate, or have something to pump data out off the ringbuffer to a file.

Even after we enable tracing by echo’ing 1 to the tracing_on file, nothing happens as we are not automatically binding to the device. In the snippet above we bind it manually, by writting the pci slot to the /sys/bus/pci/drivers/i915/bind file. We then should start seeing the function to be traced:

# head /tmp/trace.txt
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
  3)               |  i915_driver_probe [i915]() {
  3)               |    __devm_drm_dev_alloc() {
  3)               |      __kmalloc() {
  3)               |        kmalloc_order_trace() {
  3)               |          kmalloc_order() {
  3)               |            __alloc_pages() {

Another thing very useful to do is to attach to a specific tracepoint. For example, to trace the MMIO read/write we can do the commands below. Now I’m using trace-cmd directly and omitting the device bind, that we should do after start recording:

# trace-cmd record -e i915:i915_reg_rw
Hit Ctrl^C to stop recording
^C
# trace-cmd report
...
tee-2239  [009]   778.693172: i915_reg_rw:          read reg=0x51000, len=4, val=(0xc1900, 0x0)
tee-2239  [009]   778.693174: i915_reg_rw:          write reg=0xc6204, len=4, val=(0x10251000, 0x0)
tee-2239  [009]   778.693796: i915_reg_rw:          read reg=0xa26c, len=4, val=(0x4, 0x0)
tee-2239  [009]   778.693805: i915_reg_rw:          read reg=0xd00, len=4, val=(0x14, 0x0)
tee-2239  [009]   778.693808: bprint:               intel_gt_init_clock_frequency: 0000:03:00.0 Using clock frequency: 19200kHz, period: 53ns, wrap: 113816ms
tee-2239  [009]   778.693817: i915_reg_rw:          read reg=0x913c, len=4, val=(0xff00, 0x0)
tee-2239  [009]   778.693825: i915_reg_rw:          read reg=0x9144, len=4, val=(0xff00, 0x0)
tee-2239  [009]   778.693908: i915_reg_rw:          read reg=0x9134, len=4, val=(0xff, 0x0)
tee-2239  [009]   778.693918: i915_reg_rw:          read reg=0x9118, len=4, val=(0xd02, 0x0)
tee-2239  [009]   778.693933: i915_reg_rw:          read reg=0x9140, len=4, val=(0x30005, 0x0)
tee-2239  [009]   778.693941: i915_reg_rw:          read reg=0x911c, len=4, val=(0x3000000, 0x0)

blogroll

social