ram914
ram914

Reputation: 331

localtime() function is invoking ___lll_lock_wait_private() which is making the thread to go deadlock

I have seen many questions in which ___lll_lock_wait_private () is going into deadlock. But their call stack is different. Their code was calling malloc() or free() or fork().

But in my case, I have a Log class, which is trying to print a log. That log is making my thread to go deadlock.

See below call stack,

#0  0x000000fff4c47b9c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x000000fff4bf0364 in __tz_convert () from /lib64/libc.so.6
#2  0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
#3  0x000000fff5167188 in getTimeStr() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#4  0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#5  0x000000fff5318c90 in DaemonCtrlServer::strtDaemon(daemonInfo&) () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#6  0x000000fff531abc0 in DaemonCtrlServer::restrtDiedDaemon(daemonInfo&) () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#7  0x000000fff531ae64 in DaemonCtrlServer::handleChildDeath() () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#8  0x00000000100080ac in sj_initd::DaemonDeathHandler() ()
#9  0x000000001000b8f8 in sj_initd::SignalHandler(int) ()
#10 0x00000000100080e8 in sj_initd_SigHandler::sj_initdSigHandler(int) ()
#11 <signal handler called>
#12 0x000000fff4bedc00 in __offtime () from /lib64/libc.so.6
#13 0x000000fff4bf02a8 in __tz_convert () from /lib64/libc.so.6
#14 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
#15 0x000000fff5167188 in getTimeStr() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#16 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#17 0x000000fff52e868c in ConnectionOS::ProcessReadEvent() () from /usr/sbin/sajet/sharedobj/libconnV2.so
#18 0x000000fff52ef354 in ConnectionOSManager::ProcessConns(fd_set*, fd_set*) () from /usr/sbin/sajet/sharedobj/libconnV2.so
#19 0x000000fff52f0a0c in SocketsManager::ProcessFds(bool) () from /usr/sbin/sajet/sharedobj/libconnV2.so
#20 0x000000fff52c51b4 in EventReactorBase::IO (this=0x19a65e80) at EventReactorBase.cpp:361
#21 0x000000fff52c457c in EventReactorBase::React (this=0x19a65e80) at EventReactorBase.cpp:419
#22 0x000000fff52c10cc in Task::Run (this=0x19a65e30) at Task.cpp:222
#23 0x000000fff52c1218 in startTask (t=0x19a65e30) at Task.cpp:152
#24 0x000000001000a9c4 in TaskManager::Start() ()
#25 0x0000000010007538 in main ()

sj_init is a daemon which monitors the live status of other daemons in a system. When a daemon dies(which closes the connection with sj_init), it tries to restart that daemon. Then, startDaemon() is trying to print a log, which is calling getTimeStr() which internally calling ___lll_lock_wait_private

Edit

as localtime is not threadsafe, I tried with localtime_r but it also lead the thread to go into a deadlock. but acc. to localtime_r description this is threadsafe function. What am I doing wrong here?

Upvotes: 0

Views: 1301

Answers (2)

David Schwartz
David Schwartz

Reputation: 182779

#4  0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#16 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so

Notice that LogClass::logBegin appears in your call stack twice?

You have two core problems.

  1. You are calling localtime from a signal handler. That is not allowed.

  2. You are looking at the wrong thread's stack. This thread got stuck in a deadlock caused by other threads and then was interrupted, getting stuck in the deadlock again. It's a victim of the deadlock (twice!). The perpetrator is another thread.

If you're going to log from a signal handlers, you need very tight control over all the code that runs in the signal handler and you need to pass the "heavy lifting" off to another thread that can safely call non-reentrant functions.

Upvotes: 2

alk
alk

Reputation: 70951

The programs blocks inside a signal handler in __tz_convert() being called from localtime(). The signal handler interrupted __tz_convert() being called from localtime().

#0  0x000000fff4c47b9c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x000000fff4bf0364 in __tz_convert () from /lib64/libc.so.6
#2  0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
...
#11 <signal handler called>
...
#13 0x000000fff4bf02a8 in __tz_convert () from /lib64/libc.so.6
#14 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
...
#25 0x0000000010007538 in main ()

localtime() seems to not be reintrant.

A signal handler may only call a very specific set of functions. Scroll down here: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03

localtime() is not within this set.

Signal handler should be short and simple.

You could set up a thread doing the formatting and logging which for example via a pipe is being fed by the signal handler with all necessary information to format and log. The functions to do so are listed within the set linked above.

Upvotes: 2

Related Questions