AmirHmZ
AmirHmZ

Reputation: 556

Process with finished thread never exits

Why should a thread persist and prevent its process to exit, even after its target is done?

While this question uses an additional child-process, the underlaying issue is entirely rooted in multithreading. Therefore this basic issue can be reproduced with the MainProcess alone. (Edited by @Darkonaut)

I've made a class that inherits multiprocessing.Process:

class Task(Process):
    def run(self) :
        print("RUN")

        t = threading.Thread(target=do_some_work)
        t.start()
        # ...
        t.join()
        print("CLOSED")

And I start it in this way:

proc = Task()
proc.start()
proc.join()
print("JOINED")

But it won't join and the output will be like this:

>> RUN
>> CLOSED

I'm not using any kind of Queues and Pipes.

When I ran this on Ubuntu , I tracked the process with its pid. The Process still exists even after print("CLOSED") line is done without any exceptions. I also ran this on Windows and tracked the process in Task Manager. The process exits after print("CLOSED") and it's still not joining.

Another point is that on Ubuntu, when everything is stuck after print("CLOSED") and I press Ctrl + C , I get this:

Traceback (most recent call last):
  File "Scheduler.py", line 164, in <module>
    scheduler.start()
  File "Scheduler.py", line 152, in start
    self.enqueueTask(plan)
  File "Scheduler.py", line 134, in enqueueTask
    proc.join()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/usr/local/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/local/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)

According to the last line, I guess the main process is waiting for something but what and why?

The problem seems to be with a non-daemon thread that I'm starting in the run() method of Task. Making this thread a daemon thread solves the problem, so I can surely say this thread is preventing my process to be closed even after its MainThread is done. I'm still confused because the target function of that non-daemon thread is done successfully.

Upvotes: 1

Views: 2644

Answers (1)

Darkonaut
Darkonaut

Reputation: 21644

Why should a thread persist and prevent its process to exit, even after its target is done?

While this question uses an additional child-process, the underlaying issue is entirely rooted in multithreading. Therefore this basic issue can be reproduced with the MainProcess alone. An answer involving an additional child-process can be found in edit 2.


Scenario

Without having seen what your new thread in your child process is really doing, a likely scenario for your observed behavior is that your thread-1 is starting yet another thread-2, you might be even unaware about. Possibly it's started from a third party library you are calling into, or to stay within the stdlib, multiprocessing.Queue.put() also starts a feeder-thread in the background.

This general scenario is not a Process-subclassing-issue nor related to calling Process.close() from within the child-process itself (incorrect usage, but without consequences).

The MainThread in a process is always the last thread in a process exiting and it is joining non-daemonic threads as part of its _shutdown()-routine. That's what keeps the MainThread in a limbo-state while its "surface"-work is already done.

The problem is with a non-daemon thread that I'm starting in run() method of Task. so I can surely say that thread is preventing my process to be closed even after its MainThread is done. but I'm still confused because target function of that non-daemon thread is done successfully.

Now in this pictured scenario, your target function for thread-1 can actually finish successfully. However this thread-1 has started another thread-2, which then does something lasting very long, like blocking forever in worst case.

Q: If thread-1 itself is not the problem, why there is no hanging when you make thread-1 a daemon?

It's because the daemon-flag's "initial value is inherited from the creating thread". So making thread-1 a daemon, makes its descendant thread-2 a daemon too, unless the daemon-flag for thread-2 is set explicitly. Daemons are not joined on shutdown and the whole process "exits when no alive non-daemon threads are left".

Note that prior to Python 3.7, non-daemonic threads created by Process have not been joined. This divergent behaviour for threads outside the MainProcess has been fixed in bpo-18966.


Code

To show this scenario is already reproducible with a simpler setup, the example below uses the MainProcess as process which won't exit. thread-2 here is a Timer-thread, which will start and call threading.Barrier(parties=1).wait() after 10 seconds. This .wait() call then will finish immediately with parties=1, or block forever with parties=2 because no other party calling .wait() on this Barrier exists in our setup. This enables easy toggling of behavior we want to reproduce.

import threading

def blackbox(parties):
    """Dummy for starting thread we might not know about."""
    timer = threading.Timer(10, threading.Barrier(parties=parties).wait)  # Thread-2
    timer.name = "TimerThread"
    timer.start()


def t1_target(parties):  # Thread-1
    """Start another thread and exit without joining."""
    logger = get_mp_logger()
    logger.info(f"ALIVE: {[t.name for t in threading.enumerate()]}")
    blackbox(parties)
    logger.info(f"ALIVE: {[t.name for t in threading.enumerate()]}")
    logger.info("DONE")


if __name__ == '__main__':

    import logging

    parties = 1
    daemon = False
    print(f"parties={parties}, daemon={daemon}")

    logger = get_mp_logger(logging.INFO)
    logger.info(f"ALIVE: {[t.name for t in threading.enumerate()]}")
    t = threading.Thread(target=t1_target, args=(parties,), daemon=daemon)
    t.start()
    t.join()
    logger.info(f"ALIVE: {[t.name for t in threading.enumerate()]}")    
    logger.info("DONE")

The log below is for parties=1, so there is no infinite blocking, but since thread-2 is not a daemon-thread, MainThread will join it on shutdown. Note that TimerThread is still alive after t1_target is done. Of main interest here is how the MainThread needs ~10 seconds to go from "DONE" to "process shutting down". These are the 10 seconds TimerThread is alive.

parties=1, daemon=False
[18:04:31,977 MainThread <module>] ALIVE: ['MainThread']
[18:04:31,977 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1']
[18:04:31,978 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1', 'TimerThread']
[18:04:31,978 Thread-1 t1_target] DONE
[18:04:31,978 MainThread <module>] ALIVE: ['MainThread', 'TimerThread']
[18:04:31,978 MainThread <module>] DONE
[18:04:41,978 MainThread info] process shutting down

Process finished with exit code 0

With parties=2 it hangs forever at this stage,...

parties=2, daemon=False
[18:05:06,010 MainThread <module>] ALIVE: ['MainThread']
[18:05:06,010 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1']
[18:05:06,011 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1', 'TimerThread']
[18:05:06,011 Thread-1 t1_target] DONE
[18:05:06,011 MainThread <module>] ALIVE: ['MainThread', 'TimerThread']
[18:05:06,011 MainThread <module>] DONE

...unless you also set daemon=True, either for thread-1 (thread-2 inheriting) or just for thread-2 directly.

parties=2, daemon=True
[18:05:35,539 MainThread <module>] ALIVE: ['MainThread']
[18:05:35,539 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1']
[18:05:35,539 Thread-1 t1_target] ALIVE: ['MainThread', 'Thread-1', 'TimerThread']
[18:05:35,539 Thread-1 t1_target] DONE
[18:05:35,539 MainThread <module>] ALIVE: ['MainThread', 'TimerThread']
[18:05:35,539 MainThread <module>] DONE
[18:05:35,539 MainThread info] process shutting down

Process finished with exit code 0

Helper

DEFAULT_MP_FORMAT = \
    '[%(asctime)s,%(msecs)03d %(threadName)s %(funcName)s]' \
    ' %(message)s'
DEFAULT_DATEFORMAT = "%H:%M:%S"  # "%Y-%m-%d %H:%M:%S"


def get_mp_logger(level=None, fmt=DEFAULT_MP_FORMAT, datefmt=DEFAULT_DATEFORMAT):
    """
    Initialize multiprocessing-logger if needed and return reference.
    """
    import multiprocessing.util as util
    import logging
    logger = util.get_logger()
    if not logger.handlers:
        logger = util.log_to_stderr(level)
    logger.handlers[0].setFormatter(logging.Formatter(fmt, datefmt))
    return logger

Upvotes: 2

Related Questions