Blindfreddy
Blindfreddy

Reputation: 702

What is the best way to debug a python multiprocess script which fails to terminate?

I am writing a python script which uses multiprocessing, multithreading and zeromq for interprocess communication. It all works fine until the program finishes: at that time the child processes terminate properly (sigwait is intercepted and the child procs terminate which I have confirmed with the ps command) but the main process often does not shut down - occasionally it does, but most of the time it does not. I have confirmed that all remaining threads of the main process are daemonic and that the last row of the script is executed properly (it is a logging.info call). I am using fork for forking processes and can see that a Forkprocess still runs in addition to the main process.

What is the best way to debug this, considering that the script has actually finished ? Maybe add a pdb or breakpoint() right at the end ?

Thanks in advance.

Here is the output, after the last row the script usually does not terminate:

INFO root::remaining active child processes: [<ForkProcess name='SyncManager-1' pid=6362 parent=6361 started>]

INFO root::non-daemonic threads which are still running, preventing orderly shutdown: [].

INFO root::======== PID: 6361 main() end: shut down completed.=========

EDIT:

I refactored the code and noticed that it now misbehaves very rarely. I am 99.9% certain that it is due to an open zeromq REQ/REP 'socket' at the time of shutdown. The refactoring made sure that these sockets are only held open only for a very short time - but it is not predictable what sockets are open at shutdown so occasionally it still hangs.

I will write a simple testharness with two processes communicating via REQ/REP sockets then shut down the child process followed by main process. I expect same result, i.e., interpreter not shutting down. Lets see, keep you posted.

Upvotes: 1

Views: 989

Answers (1)

minker
minker

Reputation: 680

I think you could try viztracer. The good thing about viztracer is that it can display all the processes on the same timeline. Maybe you can catch what's stopping your main process/forked process from shutting down. If it's a deadlock it should be noticeable. However, without the code, I really can't tell if it would help for sure.

Upvotes: 2

Related Questions