Reputation: 18259
**SOLVED: Inside my class's constructor, I had a Semaphore's construction racing with a Thread's construction, where I wanted the Semaphore to be created first and the Thread second. The solution that worked for me was to create the Semaphore first in a base class, that way I can depend on it in my derived class. **
I have a fairly small pthreads C++ program which works fine under normal conditions. However, when using valgrind's thread error checking tools on the program, it appears to uncover a race condition. What makes this race condition particularly difficult to avoid is that it is happening inside a "Semaphore" class (which really just encapsulates sem_init, sem_wait, and sem_post), so I can't fix this with another Semaphore (and shouldn't have to). I don't think valgrind is giving a false positive since my program shows different behavior when running under valgrind.
Here's Semaphore.cpp * :
#include "Semaphore.h" #include <stdexcept> #include <errno.h> #include <iostream> Semaphore::Semaphore(bool pshared,int initial) : m_Sem(new sem_t()) { if(m_Sem==0) throw std::runtime_error("Semaphore constructor error: m_Sem == 0"); if(sem_init(m_Sem,(pshared?1:0),initial)==-1) throw std::runtime_error("sem_init failed"); } Semaphore::~Semaphore() { sem_destroy(m_Sem); delete m_Sem; } void Semaphore::lock() { if(m_Sem==0) throw std::runtime_error("Semaphore::lock error: m_Sem == 0"); int rc; for(;;){ rc = sem_wait(m_Sem); if(rc==0) break; if(errno==EINTR) continue; throw std::runtime_error("sem_wait failed"); } } void Semaphore::unlock() { if(sem_post(m_Sem)!=0) throw std::runtime_error("sem_post failed"); }
I have used this Semaphore class in other programs which pass helgrind with no problems, and I'm really not sure what I'm doing special here that is causing the issue. According to helgrind, the race is happening between a write in Semaphore's constructor in one thread and a read in Semaphore::lock in another thread. Honestly, I don't even see how that's possible: how can a method of an object have a race condition with the constructor of that object?? Doesn't C++ guarantee that the constructor has been called before it's possible to invoke a method on an object? How can this ever be violated, even in a multithreaded environment?
Anyway, now for the valgrind output. I'm using valgind version "Valgrind-3.6.0.SVN-Debian". Memcheck says all is well. Here's the result of helgrind:
$ valgrind --tool=helgrind --read-var-info=yes ./try ==7776== Helgrind, a thread error detector ==7776== Copyright (C) 2007-2009, and GNU GPL'd, by OpenWorks LLP et al. ==7776== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==7776== Command: ./try ==7776== terminate called after throwing an instance of '==7776== Thread #1 is the program's root thread ==7776== ==7776== Thread #2 was created ==7776== at 0x425FA38: clone (clone.S:111) ==7776== by 0x40430EA: pthread_create@@GLIBC_2.1 (createthread.c:249) ==7776== by 0x402950C: pthread_create_WRK (hg_intercepts.c:230) ==7776== by 0x40295A0: pthread_create@* (hg_intercepts.c:257) ==7776== by 0x804CD91: Thread::Thread(void* (*)(void*), void*) (Thread.cpp:10) ==7776== by 0x804B2D5: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== ==7776== Possible data race during write of size 4 at 0x42ee04c by thread #1 ==7776== at 0x804D9C5: Semaphore::Semaphore(bool, int) (Semaphore.cpp:8) ==7776== by 0x804B333: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== This conflicts with a previous read of size 4 by thread #2 ==7776== at 0x804D75B: Semaphore::lock() (Semaphore.cpp:26) ==7776== by 0x804B3BE: Lock::Lock(Semaphore&) (Lock.h:17) ==7776== by 0x804B497: ActionQueue::ActionQueueLoop() (ActionQueue.h:56) ==7776== by 0x8049ED5: void* CallMemFun, &(ActionQueue::ActionQueueLoop())>(void*) (CallMemFun.h:7) ==7776== by 0x402961F: mythread_wrapper (hg_intercepts.c:202) ==7776== by 0x404296D: start_thread (pthread_create.c:300) ==7776== by 0x425FA4D: clone (clone.S:130) ==7776== std::runtime_error' what(): Semaphore::lock error: m_Sem == 0 ==7776== ==7776== For counts of detected and suppressed errors, rerun with: -v ==7776== Use --history-level=approx or =none to gain increased speed, at ==7776== the cost of reduced accuracy of conflicting-access information ==7776== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 5)
Anyone with git and valgrind can reproduce this by checking out the code from my git repo branch (which, for the record, is currently on commit 262369c2d25eb17a0147) as follows:
$ git clone git://github.com/notfed/concqueue -b semaphores $ cd concqueue $ make $ valgrind --tool=helgrind --read-var-info=yes ./try
Upvotes: 3
Views: 4382
Reputation: 18259
Okay, I found the problem. My ActionQueue class was creating (in addition to others) two objects upon construction: a Semaphore, and a Thread. Problem was, this Thread was using that Semaphore. I incorrectly assumed that the Semaphore would be created automatically before entering the constructor since it is a member object. My solution was to derive ActionQueue from a base class in which my Semaphore is constructed; that way, by the time I get to ActionQueue's constructor, I can count on the base class's members already being constructed.
Upvotes: 0
Reputation: 264411
Though it looks like the thread is trying to use the Semaphore in thread 2 before thread 1 has finished running the constructor. In this case it is possible to have m_Sem be NULL(0) or any other value.
Upvotes: 3