user3550605
user3550605

Reputation: 23

How to access a process's kernel stack in linux kernel?

I am trying to monitor which functions are being called up by a process during its course of execution. My aim is to know how much time a process spends in every function. The functions are pushed over a stack and popped when function call returns. I would like to know where in the kernel code this push and pop actually happens.

I found a void *stack field in task_struct. I am not sure if this is the field I am looking for. If it is, then what is the way to know how it is updated?

I have to write a module that will make use of this code. Please help me in this case.

Upvotes: 1

Views: 2114

Answers (1)

myaut
myaut

Reputation: 11494

The functions are pushed over a stack and popped when function call returns. I would like to know where in the kernel code this push and pop actually happens.

It doesn't happen in kernel code, it is done by processor. I.e. when x86 assembly CPU finds call instruction, it pushes IP onto stack, while ret instruction will pop that value.

You can patch every call and ret instructions in kernel with call my_tracing_routine and record instruction pointer there, than pass control to original callee/caller. There are tools for that: LTTng, SystemTap, and in-kernel interfaces like kprobes, ftrace... This approach called tracing.

But if patch all instructions, i.e. with SystemTap probe kernel.function("*"), you will kill performance, and probably system panic. So, you can't measure every function call, but you can measure every Nth function call, and hope that you will get equivalent results, but you will need large sample (i.e run program for couple of minutes) -- that is called profiling.

Linux is shipped with profiler perf:

# perf record -- dd if=/dev/zero of=/dev/null
...
^C

# perf report
9.75%  dd  [kernel.kallsyms]  [k] __clear_user
6.69%  dd  [kernel.kallsyms]  [k] __audit_syscall_exit
5.61%  dd  [kernel.kallsyms]  [k] fsnotify
4.73%  dd  [kernel.kallsyms]  [k] system_call_after_swapgs
4.37%  dd  [kernel.kallsyms]  [k] system_call
...

You may also use -g to collect call chains. By default perf uses CPU performance counters, so after N CPU cycles, interrupt is raised, and perf handler (it is already embedded into kernel) saves IP.

If you wish to collect stacks, you may do that with SystemTap:

# stap --all-modules -e '
    probe timer.profile { 
        if(execname() == "dd") { 
            println("----"); 
            print_backtrace(); } 
        }' -c 'dd if=/dev/zero of=/dev/null' 
...
    ----
0xffffffff813e714d : _raw_spin_unlock_irq+0x32/0x3c [kernel]
0xffffffff81047bb9 : spin_unlock_irq+0x9/0xb [kernel]
0xffffffff8104ac68 : get_signal_to_deliver+0x4f0/0x528 [kernel]
0xffffffff8100216f : do_signal+0x48/0x4b1 [kernel]
0xffffffff81002608 : do_notify_resume+0x30/0x63 [kernel]
0xffffffff813edd6a : int_signal+0x12/0x17 [kernel]

In this example SystemTap uses timer.profile probe which attaches to a perf event cpu-clock. To do so, it generates, builds and loads kernel module. You may check that with stap -k -p 3

Upvotes: 3

Related Questions