Reputation: 1320
I have a server-client program in which there are multiple threads in both the server and client. There are variable number of clients and servers (like 3 servers (replicas), 10 clients). I am debugging a source file in this program. I think there is some kind of deadlock, possibly the following:
A mutex lock is already held by a server method and a request from the client invokes a server method which wants to acquire the mutex again.
The program is launched by a test script which spawns the servers and clients and makes the client send specific requests to the servers. I have used the following code in the suspicious area of code to see if there is a deadlock, but it doesnt seem to work, ie the code enters neither block:
if (pthread_mutex_lock(&a_mutex) == EDEADLK) {
cout<<"couldnt acquire lock."<<endl;
}
else cout<<"acquired lock"<<endl;
I tried to debug (by attaching one running server process) with gdb. I added "display" and "watch" (in different runs of gdb) for a_mutex. I get a result of the following form:
1: a_mutex = {__data = {__lock = 2, __count = 0, __owner = 4193, __kind = 0, __nusers = 2,
{__spins = 0, __list = {__next = 0x0}}},
__size = "\002\000\000\000\000\000\000\000a\020\000\000\000\000\000\000\002\000\000 \000\000\000\000", __align = 2}
I dont know the meaning of all the things in the above output, but I could see that a thread (4193) is holding the mutex. I saw the backtrace of that thread (snipped):
#0 0xb8082430 in __kernel_vsyscall ()
#1 0xb7e347a6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7e345be in sleep () from /lib/tls/i686/cmov/libc.so.6
#3 0x0804cb59 in class1::method1 (this=0xbfa9fe6c, clt=1, id=
{static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0xb7c9c11c "l/%\b"}})
at file1.cc:33
I dont know how and where the bug is.
I would highly appreciate any help with the following questions:
PS: I have read this question already.
Thank you very much.
Upvotes: 4
Views: 3181
Reputation: 741
Use GDB and attach it to the hung program. Then use "thread apply all bt" (I think but I don't have a system handy).
It'll give you a backtrace of all of the threads and you should be able to see which thread is doing what.
If this problem is easily reproducible too you can use strace to give you some info one which locks are being taken.
Upvotes: 4