Derui Si
Derui Si

Reputation: 1105

Linux: How can I find the thread which holds a particular lock?

I have a multi-threads program which is running on Linux, sometimes if I run gstack against it, there is a thread was waiting for a lock for a long time(say, 2-3 minutes),

Thread 2 (Thread 0x5e502b90 (LWP 19853)):

0 0x40000410 in __kernel_vsyscall ()

1 0x400157b9 in __lll_lock_wait () from /lib/i686/nosegneg/libpthread.so.0

2 0x40010e1d in _L_lock_981 () from /lib/i686/nosegneg/libpthread.so.0

3 0x40010d3b in pthread_mutex_lock () from /lib/i686/nosegneg/libpthread.so.0

...

I checked the rest of the threads, none of them were taking this lock, however, after a while this thread (LWP 19853) could acquire this lock successfully.

There should exist one thread that had already acquired this lock, but I failed to find it, is there anything I missing?

EDIT: The definition of the pthread_mutex_t:

typedef union

{

struct __pthread_mutex_s {

int __lock;

unsigned int __count;

int __owner;

/* KIND must stay at this position in the structure to maintain binary compatibility. */

int __kind;

unsigned int __nusers;

extension union { int __spins; __pthread_slist_t __list; };

} __data;

char _size[_SIZEOF_PTHREAD_MUTEX_T];

long int __align;

} pthread_mutex_t;

There is a member "__owner", it is the id of the thread who is holding the mutex now.

Upvotes: 4

Views: 4973

Answers (4)

Rafael Baptista
Rafael Baptista

Reputation: 11499

Mutexes by default don't track the thread that locked them. (Or at least I don't know of such a thing )

There are two ways to debug this kind of problem. One way is to log every lock and unlock. On every thread creation you log the value of the thread id that got created. Right after locking any lock, you log the thread id, and the name of the lock that was locked ( you can use file/line for this, or assign a name to each lock). And you log again right before unlocking any lock.

This is a fine way to do it if your program doesn't have tens of threads or more. After that the logs start to become unmanageable.

The other way is to wrap your lock in a class that stores the thread id in a lock object right after each lock. You might even create a global lock registry that tracks this, that you can print out when you need to.

Something like:

class MyMutex
{
public:
    void lock() { mMutex.lock(); mLockingThread = getThreadId(); }
    void unlock() { mLockingThread = 0; mMutex.unlock(); }
    SystemMutex mMutex;
    ThreadId    mLockingThread;
};

The key here is - don't implement either of these methods for your release version. Both a global locking log, or a global registry of lock states creates a single global resource that will itself become a resource under lock contention.

Upvotes: 2

alk
alk

Reputation: 70893

For such debugging issues you might two add special logging calls to your program stating when which tread had aquired the lock and when it returned it.

Such log entries then will help you finding which thread aquired the lock last.

Anyway doing so might massivly change the run time behavior of the program and the issue to be debugged won't appear anymore outing itself as sort of a classical heisenbug as seen often in multi-threaded applications.

Upvotes: 0

Jens Gustedt
Jens Gustedt

Reputation: 78903

2-3 minutes sounds a lot, but if your system is under heavy load, there is no guarantee that your thread wakes up immediately after another one has unlocked the mutex. So there might just be no thread (anymore) that holds the lock in the moment that you are looking at it.

Linux mutex work in two stages. Roughly:

  • At the first stage there is a atomic CAS operation on an int value to see if the mutex can be locked immediately.
  • If this is not possible a futex_wait system call with the address of the same int is passed to the kernel.

An unlock operation then consist in changing the value back to the initial value (usually 0) and doing a futex_wake system call. The kernel then looks if someone registered a futex_wait call on the same address, and revives those threads in the scheduling queue. Which thread the really gets woken up and when depends on different things, in particular the scheduling policy that is enabled. There is no guarantee that threads obtain the locks in the order they placed them.

Upvotes: 2

ugoren
ugoren

Reputation: 16441

The POSIX API doesn't contain a function that does it.

It's also possible that on some platforms, the implementation doesn't allow this.
For example, a lock can use an atomic variable, set to 1 when locked. The thread obtaining it doesn't have to write its ID anywhere, so no function can find it.

Upvotes: 0

Related Questions