Reputation: 4803
I am running a phread test program until it fails. Here is the main skeleton of the code:
int authSessionListMutexUnlock()
{
int rc = 0;
int rc2 = 0;
rc2 = pthread_mutex_trylock(&mutex);
ERR_IF( rc2 != EBUSY && rc2 != 0 );
rc2 = pthread_mutex_unlock(&mutex);
ERR_IF( rc2 != 0 );
cleanup:
return rc;
}
static void cleanup_handler(void *arg)
{
int rc = 0;
(void)arg;
rc = authSessionListMutexUnlock();
if (rc != 0)
AUTH_DEBUG5("authSessionListMutexUnlock() failed\n");
}
static void *destroy_expired_sessions(void *t)
{
int rc2 = 0;
(void)t;
pthread_cleanup_push(cleanup_handler, NULL);
rc2 = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcancelstate(): rc2 == %d\n", rc2);
rc2 = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
if (rc2 != 0)
AUTH_DEBUG5("pthread_setcanceltype(): rc2 == %d\n", rc2);
while (1)
{
... // destroy expired session
sleep(min_timeout);
}
pthread_cleanup_pop(0);
}
int authDeinit( char *path )
{
...
rc2 = authSessionListDeInit();
ERR_IF( rc2 != 0 );
rc2 = pthread_cancel(destroy_thread);
ERR_IF( rc2 != 0 );
rc2 = pthread_join(destroy_thread, &status);
ERR_IF( rc2 != 0 || (int *)status != PTHREAD_CANCELED );
...
return 0
}
It runs well with the test program, but the test program hangs at round #53743 with pthread_join():
(gdb) bt
#0 0x40000410 in __kernel_vsyscall ()
#1 0x0094aa77 in pthread_join () from /lib/libpthread.so.0
#2 0x08085745 in authDeinit ()
at /users/qixu/src/moja/auth/src//app/libauth/authAPI.c:1562
#3 0x0807e747 in main ()
at /users/qixu/src/moja/auth/src//app/tests/test_session.c:45
Looks like pthread_join() caused a deadlock. But looking at the code, I feel there is no reason that a dead lock be caused by pthread_join(). When pthread_join() gets the chance to run, the only mutex operation is of the thread itself. Should be no conflict, right? Really confused here...
Upvotes: 1
Views: 4419
Reputation: 215387
A bigger problem with your code, and probably the cause of the deadlocks, is your use of asynchronous cancellation mode (I missed this before). Only 3 functions in POSIX are async-cancel-safe:
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_04
You certainly cannot lock and unlock mutexes while async cancel mode is enabled.
For async cancellation to be usable, you have to do one of the following things:
Edit: Based on the comments, I think you have a misunderstanding of what asynchronous cancellation type means. It has nothing to do with the manner in which cleanup handlers run. It's purely a matter of what point the thread can catch the cancellation request and begin acting on it.
When the target is in deferred cancellation mode, calling pthread_cancel
on it will not necessarily do anything right away, unless it's already blocked in a function (like read
or select
) that's a cancellation point. Instead it will just set a flag, and the next time a function which is a cancellation point is called, the thread will instead block any further cancellation attempts, run the cancellation cleanup handlers in the reverse order they were pushed, and exit with a special status indicating that the thread was cancelled.
When the target is in asynchronous cancellation mode, calling pthread_cancel
on it will interrupt the thread immediately (possibly between any pair of adjacent machine code instructions). If you don't see why this is potentially dangerous, think about it for a second. Any function that has internal state (static/global variables, file descriptors or other resources being allocated/freed, etc.) could be in inconsistent state at the point of the interruption: a variable partially modified, a lock halfway obtained, a resource obtained but with no record of it having been obtained, or freed but with no record of it having been freed, etc.
At the point of the asynchronous interruption, further cancellation requests are blocked, so there's no danger of calling whatever function you like from your cleanup handlers. When the cleanup handlers finish running, the thread of course ceases to exist.
One other potential source of confusion: cleanup handlers do not run in parallel with the thread being cancelled. When cancellation is acted upon, the cancelled thread stops running the normal flow of code, and instead runs the cleanup handlers then exits.
Upvotes: 3
Reputation: 180987
At least one "oddity" shows in your code; your cleanup handler will always unlock the mutex even if you're not the thread holding it.
From the manual;
Calling pthread_mutex_unlock() with a mutex that the calling thread does not hold will result in undefined behavior.
Upvotes: 5