lawrenceB
lawrenceB

Reputation: 53

pthread_join crashes intermittently with segmentation fault on OSX

I'm getting a segmentation fault while joining on a child thread and I've exhausted all options I could think of debugging, looking on Stack-overflow and the rest of the Internet! :) I'll be as thorough as I can. The code is written in C++ and compiled with GNU GCC on OSX 10.6.8. I've linked in the 'pthread' library using the '-pthread' parameter. I've also tried '-lphtread'. No difference.

I'm using the following global variables:

pthread_t gTid;

pthread_attr_t gAttr;

int gExitThread = 0;

I'm creating a child thread from my main thread of execution:

err = pthread_attr_init(&gAttr);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_attr_setdetachstate(&gAttr, PTHREAD_CREATE_JOINABLE);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_create(&gTid,&gAttr,threadHandler,NULL);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

Inside 'threadHandler', I have the following run loop using the core foundation API:

// Enter run loop
result = CFRunLoopRunInMode(kCFRunLoopDefaultMode, RUN_LOOP_TIMEOUT, false);
while (result == kCFRunLoopRunTimedOut)
{
    if (gExitThread) break;
    result = CFRunLoopRunInMode(kCFRunLoopDefaultMode, RUN_LOOP_TIMEOUT, false);
}

The gExitThread global variable is used to signal that the thread should gracefully kill itself. The RUN_LOOP_TIMEOUT macro is set to 2 seconds (although larger and smaller values make no difference).

The thread is signalled to be killed by the following piece of code in the main thread:

int err = 0;
void* exitValue = NULL;

printf("Stopping controller thread...\n");

gExitThread = 1;
err = pthread_join(gTid, &exitValue);
if (err)
{
    displayError2(err);
    throw CONTROLLER_THREAD_ERROR;
}

err = pthread_attr_destroy(&gAttr);
if (err)
{
    throw CONTROLLER_THREAD_ERROR;
}

The call to 'pthread_join' crashes with a segmentation fault after a short delay. I've also noticed that replacing the call of 'pthread_join' with a normal sleep of let's say two seconds, causes the exact same segmentation fault when executing 'usleep(2000000)'! I'll copy the back trace of the core dump below for both 'pthread_join' and 'usleep'.

pthread_join:

#0  0x00007fff8343aa6a in __semwait_signal ()
#1  0x00007fff83461896 in pthread_join ()
#2  0x000000010000179d in Controller::cleanup () at src/native/osx/controllers.cpp:335
#3  0x0000000100008e51 in ControllersTest::performTest (this=0x100211bf0) at unittests/src/controllers_test.cpp:70
#4  0x000000010000e5b9 in main (argc=2, argv=0x7fff5fbff980) at unittests/src/verify.cpp:34

usleep(2000000):

#0  0x00007fff8343aa6a in __semwait_signal ()
#1  0x00007fff8343a8f9 in nanosleep ()
#2  0x00007fff8343a863 in usleep ()
#3  0x000000010000177b in Controller::cleanup () at src/native/osx/controllers.cpp:335
#4  0x0000000100008e3d in ControllersTest::performTest (this=0x100211bf0) at unittests/src/controllers_test.cpp:70
#5  0x000000010000e5a5 in main (argc=2, argv=0x7fff5fbff980) at unittests/src/verify.cpp:34

Any help will be greatly appreciated.

Upvotes: 5

Views: 2481

Answers (1)

Milan
Milan

Reputation: 15849

It seems that the code after your while loop inside the threadHandler is causing a segfault. If a signal is generated (e.g. SIGSEGV) inside a thread, the process itself will get killed.

Try using GDB and thread apply all bt in order to get backtrace for all threads.

Upvotes: 8

Related Questions