Reputation: 702
I am running a multi-process (and multi-threaded) python script on debian linux. One of the processes repeatedly crashes after 5 or 6 days. It is always the same, unique workload on the process that crashes. There are no entries in syslog about the crash - the process simply disappears silently. It also behaves completely normally and produces normal results, then suddenly stops.
How can I instrument the rogue process. Increasing the loglevel will produce large amounts of logs, so that's not my preferred option.
Upvotes: 0
Views: 150
Reputation: 702
I used good-old log analysis to determine what happens when the process fails.
I found following error at that time; first row is the last entry made by the rogue process (just before it fails), the 2nd row is the one pointing to the underlying error. In this case there is a problem with pyzmq bindings or zeromq library. I'll open a ticket with them.
Aug 10 08:30:13 rpi6 python[16293]: 2021-08-10T08:30:13.045 WARNING w1m::pid 16325, tid 16415, taking reading from sensors with map {'000005ccbe8a': ['t-top'], '000005cc8eba': ['t-mid'], '00000676e5c3': ['t
Aug 10 08:30:14 rpi6 python[16293]: Too many open files (bundled/zeromq/src/ipc_listener.cpp:327)
A
Hope this helps someone in the future.
Upvotes: 1