Reputation: 23
I am trying to monitor which functions are being called up by a process during its course of execution. My aim is to know how much time a process spends in every function. The functions are pushed over a stack and popped when function call returns. I would like to know where in the kernel code this push and pop actually happens.
I found a void *stack
field in task_struct
. I am not sure if this is the field I am looking for. If it is, then what is the way to know how it is updated?
I have to write a module that will make use of this code. Please help me in this case.
Upvotes: 1
Views: 2114
Reputation: 11494
The functions are pushed over a stack and popped when function call returns. I would like to know where in the kernel code this push and pop actually happens.
It doesn't happen in kernel code, it is done by processor. I.e. when x86 assembly CPU finds call
instruction, it pushes IP
onto stack, while ret
instruction will pop that value.
You can patch every call
and ret
instructions in kernel with call my_tracing_routine
and record instruction pointer there, than pass control to original callee/caller. There are tools for that: LTTng, SystemTap, and in-kernel interfaces like kprobes, ftrace... This approach called tracing.
But if patch all instructions, i.e. with SystemTap probe kernel.function("*")
, you will kill performance, and probably system panic. So, you can't measure every function call, but you can measure every Nth function call, and hope that you will get equivalent results, but you will need large sample (i.e run program for couple of minutes) -- that is called profiling.
Linux is shipped with profiler perf
:
# perf record -- dd if=/dev/zero of=/dev/null
...
^C
# perf report
9.75% dd [kernel.kallsyms] [k] __clear_user
6.69% dd [kernel.kallsyms] [k] __audit_syscall_exit
5.61% dd [kernel.kallsyms] [k] fsnotify
4.73% dd [kernel.kallsyms] [k] system_call_after_swapgs
4.37% dd [kernel.kallsyms] [k] system_call
...
You may also use -g
to collect call chains. By default perf
uses CPU performance counters, so after N CPU cycles, interrupt is raised, and perf handler (it is already embedded into kernel) saves IP
.
If you wish to collect stacks, you may do that with SystemTap:
# stap --all-modules -e '
probe timer.profile {
if(execname() == "dd") {
println("----");
print_backtrace(); }
}' -c 'dd if=/dev/zero of=/dev/null'
...
----
0xffffffff813e714d : _raw_spin_unlock_irq+0x32/0x3c [kernel]
0xffffffff81047bb9 : spin_unlock_irq+0x9/0xb [kernel]
0xffffffff8104ac68 : get_signal_to_deliver+0x4f0/0x528 [kernel]
0xffffffff8100216f : do_signal+0x48/0x4b1 [kernel]
0xffffffff81002608 : do_notify_resume+0x30/0x63 [kernel]
0xffffffff813edd6a : int_signal+0x12/0x17 [kernel]
In this example SystemTap uses timer.profile
probe which attaches to a perf event cpu-clock
. To do so, it generates, builds and loads kernel module. You may check that with stap -k -p 3
Upvotes: 3