Reputation: 1697
Having what appears to be a dead-lock situation with a multi-threaded logging application.
Little background:
My main application has 4-6 threads running. The main thread responsible for monitoring health of various things I'm doing, updating GUIs, etc... Then I have a transmit thread and a receive thread. The transmit and receive threads talk to physical hardware. I sometimes need to debug the data that the transmit and receive threads are seeing; i.e. print to a console without interrupting them due to their time critical nature of the data. The data, by the way, is on a USB bus.
Due to the threading nature of the application, I want to create a debug console that I can send messages to from my other threads. The debug consule runs as a low priority thread and implements a ring buffer such that when you print to the debug console, the message is quickly stored to a ring buffer and sets and event. The debug console's thread sits WaitingOnSingleObject events from the in bound messages that come in. When an event is detected, console thread updates a GUI display with the message. Simple eh? The printing calls and the console thread use a critical section to control access.
NOTE: I can adjust the ring buffer size if I see that I am dropping messages (at least that's the idea).
In a test application, the console works very well if I call its Print method slowly via mouse clicks. I have a button that I can press to send messages to the console and it works. However, if I put any sort of load (many calls to Print method), everything dead-locks. When I trace the dead-lock, my IDE's debugger traces to EnterCriticalSection and sits there.
NOTE: If I remove the Lock/UnLock calls and just use Enter/LeaveCriticalSection (see the code) I sometimes work but still find myself in a dead-lock situation. To rule out deadlocks to stack push/pops, I call Enter/LeaveCriticalSection directly now but this did not solve my issue.... What's going on here?
Here is one Print statement, that allows me to pass in a simple int to the display console.
void TGDB::Print(int I)
{
//Lock();
EnterCriticalSection(&CS);
if( !SuppressOutput )
{
//swprintf( MsgRec->Msg, L"%d", I);
sprintf( MsgRec->Msg, "%d", I);
MBuffer->PutMsg(MsgRec, 1);
}
SetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
}
// My Lock/UnLock methods
void TGDB::Lock(void)
{
EnterCriticalSection(&CS);
}
bool TGDB::TryLock(void)
{
return( TryEnterCriticalSection(&CS) );
}
void TGDB::UnLock(void)
{
LeaveCriticalSection(&CS);
}
// This is how I implemented Console's thread routines
DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;
TGDB *g = (TGDB *)pA;
return( g->ProcessMessages() );
}
DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;
do
{
rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);
switch(rVal)
{
case WAIT_OBJECT_0:
EnterCriticalSection(&CS);
//Lock();
if( KeepRunning )
{
Info->Caption = "Rx";
Info->Refresh();
MsgCnt = MBuffer->GetMsgCount();
for(int i=0; i<MsgCnt; i++)
{
MBuffer->GetMsg( MsgRec, 1);
Log->Lines->Add(MsgRec->Msg);
}
}
brVal = KeepRunning;
ResetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
break;
case WAIT_TIMEOUT:
EnterCriticalSection(&CS);
//Lock();
Info->Caption = "Idle";
Info->Refresh();
brVal = KeepRunning;
ResetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
break;
case WAIT_FAILED:
EnterCriticalSection(&CS);
//Lock();
brVal = false;
Info->Caption = "ERROR";
Info->Refresh();
aLine.sprintf("Console error: [%d]", GetLastError() );
Log->Lines->Add(aLine);
aLine = "";
LeaveCriticalSection(&CS);
//UnLock();
break;
}
}while( brVal );
return( rVal );
}
MyTest1 and MyTest2 are just two test functions that I call in response to a button press. MyTest1 never causes a problem no matter how fast I click the button. MyTest2 dead locks nearly everytime.
// No Dead Lock
void TTest::MyTest1()
{
if(gdb)
{
// else where: gdb = new TGDB;
gdb->Print(++I);
}
}
// Causes a Dead Lock
void TTest::MyTest2()
{
if(gdb)
{
// else where: gdb = new TGDB;
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
}
}
UPDATE: Found a bug in my ring buffer implementation. Under heavy load, when buffer wrapped, I didn't detect a full buffer properly so buffer was not returning. I'm pretty sure that issue is now resolved. Once I fixed the ring buffer issue, performance got much better. However, if I decrease the iWaitTime, my dead lock (or freeze up issue) returns.
So after further tests with a much heavier load it appears my deadlock is not gone. Under super heavy load I continue to deadlock or at least my app freezes up but no where near it use to since I fixed ring buffer problem. If I double the number of Print calls in MyTest2 I easily can lock up every time....
Also, my updated code is reflected above. I know make sure my Set & Reset event calls are inside critical section calls.
Upvotes: 3
Views: 6959
Reputation: 36046
With those options closed up, I would ask questions about this "Info" object. Is it a window, which window is it parented to, and which thread was it created on?
If Info, or its parent window, was created on the other thread, then the following situation might occur:
The Console Thread is inside a critical section, processing a message. The Main thread calls Print() and blocks on a critical section waiting for the Console Thread to release the lock. The Console thread calls a function on Info (Caption), which results in the system sending a message (WM_SETTEXT) to the window. SendMessage blocks because the target thread is not in a message alertable state (isn't blocked on a call to GetMessage/WaitMessage/MsgWaitForMultipleObjects).
Now you have a deadlock.
This kind of #$(%^ can happen whenever you mix blocking routines with anything that interacts with windows. The only appropriate blocking function to use on a GUI thread is MSGWaitForMultipleObjects otherwise SendMessage calls to windows hosted on the thread can easily deadlock.
Avoiding this involves two possible approaches:
Upvotes: 4
Reputation: 283733
I would strongly recommend a lockfree implementation.
Not only will this avoid potential deadlock, but debug instrumentation is one place where you absolutely do not want to take a lock. The impact of formatting debug messages on timing of a multi-threaded application is bad enough... having locks synchronize your parallel code just because you instrumented it makes debugging futile.
What I suggest is an SList-based design (The Win32 API provides an SList implementation, but you can build a thread-safe template easily enough using InterlockedCompareExchange and InterlockedExchange). Each thread will have a pool of buffers. Each buffer will track the thread it came from, after processing the buffer, the log manager will post the buffer back to the source thread's SList for reuse. Threads wishing to write a message will post a buffer to the logger thread. This also prevents any thread from starving other threads of buffers. An event to wake the logger thread when a buffer is placed into the queue completes the design.
Upvotes: 2
Reputation: 36046
Without knowing where it is deadlocking this code is hard to figure out. Two comments tho:
Given that this is c++, you should be using an Auto object to perform the lock and unlock. Just in case it ever becomes non catastrophic for Log to throw an exception.
You are resetting the event in response to WAIT_TIMEOUT. This leaves a small window of opportunity for a 2nd Print() call to set the event while the worker thread has returned from WaitForMultiple, but before it has entered the critical section. Which will result in the event being reset when there is actually data pending.
But you do need to debug it and reveal where it "Deadlocks". If one thread IS stuck on EnterCriticalSection, then we can find out why. If neither thread is, then the incomplete printing is just the result of an event getting lost.
Upvotes: 2