Reputation: 485
I have a daemon, written in Python 2.7, which works so:
1 - script starts 4 threads
2 - 4 threads are doing some work simultaneously
3 - script waits for all threads to finish, using thread.join()
4 - 1-3 in a loop
In pseudocode it looks like:
formatter = logging.Formatter('%(threadName)s : %(message)s')
# (... logging setup ...)
def doSomeWork(item):
log.debug('Doing some work with item %s', item)
# (... doing some work ...)
itemList = [some, items, thatProgram, worksWith]
while True:
threads = []
for item in itemList:
if someComplexConditionCheck:
threads.append(threading.Thread(target=doSomeWork, args=(item,))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
time.sleep(10)
(of course, real program is much more complex) (real doSomeWork code may start some other threads, but it also uses thread.join() to wait for finishing). Main while(true) loop will never continue until all previous threads finish.
After several days my program gets crashed with "error: can't start new thread". Last record in log corresponds to thread 15027, it looks like:
Thread-15027 Doing some work
I looked up stackoverflow, but all advices I found were to check number of threads running at once using command ps -fLu UserName
.
I see there only threads running at moment of checking, so older threads are ALWAYS get finished before starting new because of join command.
I think, the problem may be in large thread id (15027), which is getting incremented after each call to Thread constructor. (Am I right?) But I have no idea how to reset it but restarting daemon every day using crontab, but it's a very dirty hack.
Upvotes: 0
Views: 456
Reputation: 3856
Since the problem is not related directly to thread creation, can you check whether memory is running out? That seems like the most likely culprit. But, regardless of the cause, one way to verify is to run your app under strace and look for ENOMEM errors, or any other errors.
My original thought was to have your app check available memory, but doing so from the app gets tricky because the OS will sometimes use large amounts of free memory, then give it up when an app needs it.
strace -o app_strace.log python app.py myarg1 myarg2
POSIX system calls normally return -1 if there's an error, so you can grep the log file
grep " \= -1" app_strace.log
Upvotes: 0