Reputation: 1362
I have a buggy kernel module which I am trying to fix. Basically when this module is running, it will cause other tasks to hang for more than 120 seconds. Since almost all the hung tasks are waiting for either mm->mmap_sem or some file system locks (i_node->i_mutex) I suspect that it has something to do with this module doesn't not grab the mmap_sem lock and some file-system level lock (like inote->i_mutex) in order, which could have caused some deadlock problem. Since my module does not try to grab those locks directly though, I assume it is some function I called that grab those locks. And now I am trying to figure out which function calls in my module is causing the problem.
However, I am having a hard time debugging it for the following reasons:
I don't know exactly which lock the hung task is trying to grab. I got the call trace of the hung task, and know at what point it hangs. Kernel also gives me some kind of information like: "1 lock held by automount/3115: 0: (&type->i_mutex_dir_key#2){--..}, at: [] real_lookup+0x24/0xc5". However, I want to know exact which lock a task holds, and exactly which lock it is trying to acquire in order to figure out the problem. As kernel doesn't provide the arguments of function calls along with the call trace, I find this information difficult to obtain.
I am using gdb andvmware to debug this, which allows me to set breakpoints, step into a function and such. However, as which task and at what point that task will hang is kind of un-deterministic, I don't really know where to set breakpoints and inspect. It will be great if I can somehow "attach" to the task which kernel reported to be blocked for more than 120 secs, and get some information about it.
So my questions are as following:
Where can I get, along with the call trace, the arguments of the functions in the call trace, in order to figure out exactly which lock a task is trying to grab.
Is it possible for me to use gdb to somehow "attach" to a hung task in a kernel? If not, is there some way for me to at least examine the data structure which represents that task? As I am having a hard time examining all the global data structure in kernel too. GDB always complains that "can't access memory 0x3200" or something similar.
It would also be very helpful if I can print out for every task in the kernel, what locks they are currently holding. Is there a way to do it?
Thank you very much!
Upvotes: 5
Views: 13091
Reputation: 11609
The kernel feature lockdep
can help you in this regard. Check out my post on how to use it in your kernel: How to use lockdep feature in linux kernel for deadlock detection
Upvotes: 3
Reputation: 573
Let me try. 1) Try KGDB
2) You mean a hung process? http://www.ibm.com/developerworks/aix/library/au-unix-strace.html
3) Try the lsof package maybe.
Upvotes: 1
Reputation: 15218
Not answering your question directly, but hopefully this is more helpful - the Linux kernel has a built heavy duty lock validator called lockdep. Turn it on and let it run. If you have a lock order problem, it is likely to catch it and give you a detailed report.
See: http://www.mjmwired.net/kernel/Documentation/lockdep-design.txt
Upvotes: 3