Egor
Egor

Reputation: 485

Python error "can't start new thread", but there are no other threads running

I have a daemon, written in Python 2.7, which works so:

1 - script starts 4 threads

2 - 4 threads are doing some work simultaneously

3 - script waits for all threads to finish, using thread.join()

4 - 1-3 in a loop

In pseudocode it looks like:

formatter = logging.Formatter('%(threadName)s : %(message)s')
# (... logging setup ...)
def doSomeWork(item):
    log.debug('Doing some work with item %s', item)
    # (... doing some work ...)
itemList = [some, items, thatProgram, worksWith]
while True:
    threads = []
    for item in itemList:
        if someComplexConditionCheck:
             threads.append(threading.Thread(target=doSomeWork, args=(item,))
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    time.sleep(10)

(of course, real program is much more complex) (real doSomeWork code may start some other threads, but it also uses thread.join() to wait for finishing). Main while(true) loop will never continue until all previous threads finish.

After several days my program gets crashed with "error: can't start new thread". Last record in log corresponds to thread 15027, it looks like:

Thread-15027 Doing some work

I looked up stackoverflow, but all advices I found were to check number of threads running at once using command ps -fLu UserName. I see there only threads running at moment of checking, so older threads are ALWAYS get finished before starting new because of join command.

I think, the problem may be in large thread id (15027), which is getting incremented after each call to Thread constructor. (Am I right?) But I have no idea how to reset it but restarting daemon every day using crontab, but it's a very dirty hack.

Upvotes: 0

Views: 456

Answers (1)

Brad Dre
Brad Dre

Reputation: 3856

Since the problem is not related directly to thread creation, can you check whether memory is running out? That seems like the most likely culprit. But, regardless of the cause, one way to verify is to run your app under strace and look for ENOMEM errors, or any other errors.

My original thought was to have your app check available memory, but doing so from the app gets tricky because the OS will sometimes use large amounts of free memory, then give it up when an app needs it.

strace -o app_strace.log python app.py myarg1 myarg2

POSIX system calls normally return -1 if there's an error, so you can grep the log file

grep " \= -1" app_strace.log

Upvotes: 0

Related Questions