Reputation: 76
I am having horrible time trying to figure out why my sync. code gets deadlocked when using pthread library. Using winapi primitives is instead of pthread works with no problems. Using c++11 threading also works fine (unless compiled with visual studio 2012 service pack 3, there it just crashes - Microsoft accepted it as a bug.) Using pthread however proves to be a problem - at least running in on linux machine, haven't had a chance to try different OS.
I've written a simple program which illustrates the issue. The code is just showing the deadlock - I am well aware the design is pretty poor and can be written much better.
typedef struct _pthread_event
{
pthread_mutex_t Mutex;
pthread_cond_t Condition;
unsigned char State;
} pthread_event;
void pthread_event_create( pthread_event * ev , unsigned char init_state )
{
pthread_mutex_init( &ev->Mutex , 0 );
pthread_cond_init( &ev->Condition , 0 );
ev->State = init_state;
}
void pthread_event_destroy( pthread_event * ev )
{
pthread_cond_destroy( &ev->Condition );
pthread_mutex_destroy( &ev->Mutex );
}
void pthread_event_set( pthread_event * ev , unsigned char state )
{
pthread_mutex_lock( &ev->Mutex );
ev->State = state;
pthread_mutex_unlock( &ev->Mutex );
pthread_cond_broadcast( &ev->Condition );
}
unsigned char pthread_event_get( pthread_event * ev )
{
unsigned char result;
pthread_mutex_lock( &ev->Mutex );
result = ev->State;
pthread_mutex_unlock( &ev->Mutex );
return result;
}
unsigned char pthread_event_wait( pthread_event * ev , unsigned char state , unsigned int timeout_ms )
{
struct timeval time_now;
struct timespec timeout_time;
unsigned char result;
gettimeofday( &time_now , NULL );
timeout_time.tv_sec = time_now.tv_sec + ( timeout_ms / 1000 );
timeout_time.tv_nsec = time_now.tv_usec * 1000 + ( ( timeout_ms % 1000 ) * 1000000 );
pthread_mutex_lock( &ev->Mutex );
while ( ev->State != state )
if ( ETIMEDOUT == pthread_cond_timedwait( &ev->Condition , &ev->Mutex , &timeout_time ) ) break;
result = ev->State;
pthread_mutex_unlock( &ev->Mutex );
return result;
}
static pthread_t thread_1;
static pthread_t thread_2;
static pthread_event data_ready;
static pthread_event data_needed;
void * thread_fx1( void * c )
{
for ( ; ; )
{
pthread_event_wait( &data_needed , 1 , 90 );
pthread_event_set( &data_needed , 0 );
usleep( 100000 );
pthread_event_set( &data_ready , 1 );
printf( "t1: tick\n" );
}
}
void * thread_fx2( void * c )
{
for ( ; ; )
{
pthread_event_wait( &data_ready , 1 , 50 );
pthread_event_set( &data_ready , 0 );
pthread_event_set( &data_needed , 1 );
usleep( 100000 );
printf( "t2: tick\n" );
}
}
int main( int argc , char * argv[] )
{
pthread_event_create( &data_ready , 0 );
pthread_event_create( &data_needed , 0 );
pthread_create( &thread_1 , NULL , thread_fx1 , 0 );
pthread_create( &thread_2 , NULL , thread_fx2 , 0 );
pthread_join( thread_1 , NULL );
pthread_join( thread_2 , NULL );
pthread_event_destroy( &data_ready );
pthread_event_destroy( &data_needed );
return 0;
}
Basically two threads signaling each other - start doing something, and doing their own thing even if not signaled after a short timeout.
Any idea what it going wrong there?
Thanks.
Upvotes: 1
Views: 543
Reputation: 4462
The problem is the timeout_time
parameter to pthread_cond_timedwait()
. The way you increment it is eventually and quite soon going to have an invalid value there with nanosecond part bigger than or equal to billion. In this case pthread_cond_timedwait()
perhaps return with EINVAL
, and probably actually before waiting for the condition.
The problem can be found very quickly with valgrind --tool=helgrind ./test_prog
(quite soon it said it had already detected 10000000 errors and gave up counting):
bash$ gcc -Werror -Wall -g test.c -o test -lpthread && valgrind --tool=helgrind ./test
==3035== Helgrind, a thread error detector
==3035== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==3035== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==3035== Command: ./test
==3035==
t1: tick
t2: tick
t2: tick
t1: tick
t2: tick
t1: tick
t1: tick
t2: tick
t1: tick
t2: tick
t1: tick
==3035== ---Thread-Announcement------------------------------------------
==3035==
==3035== Thread #2 was created
==3035== at 0x41843C8: clone (clone.S:110)
==3035==
==3035== ----------------------------------------------------------------
==3035==
==3035== Thread #2's call to pthread_cond_timedwait failed
==3035== with error code 22 (EINVAL: Invalid argument)
==3035== at 0x402DB03: pthread_cond_timedwait_WRK (hg_intercepts.c:784)
==3035== by 0x8048910: pthread_event_wait (test.c:65)
==3035== by 0x8048965: thread_fx1 (test.c:80)
==3035== by 0x402E437: mythread_wrapper (hg_intercepts.c:219)
==3035== by 0x407DD77: start_thread (pthread_create.c:311)
==3035== by 0x41843DD: clone (clone.S:131)
==3035==
t2: tick
==3035==
==3035== More than 10000000 total errors detected. I'm not reporting any more.
==3035== Final error counts will be inaccurate. Go fix your program!
==3035== Rerun with --error-limit=no to disable this cutoff. Note
==3035== that errors may occur in your program without prior warning from
==3035== Valgrind, because errors are no longer being displayed.
==3035==
^C==3035==
==3035== For counts of detected and suppressed errors, rerun with: -v
==3035== Use --history-level=approx or =none to gain increased speed, at
==3035== the cost of reduced accuracy of conflicting-access information
==3035== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 412 from 109)
Killed
There is also two other minor comments:
pthread_event_set()
you could have the condition variable broadcast be done before mutex unlock (the effect of wrong ordering basically could break the determinism of the scheduling; helgrind
complains about this issue too);ev->State
- this should be atomic operation.Upvotes: 1