Can I use a StreamHandler for Logging in a Multiprocessing Environment in Python?

Question

Is it safe to use a single StreamHandler in a multiprocessing environment?

More precisely, can it be problematic to have a just one StreamHandler that simply prints the logging statements of all processes to stdout? Like this, for example:

import multiprocessing as mp
import logging


def do_log(no):
    # 2nd EDIT, suppose we do also this,
    # which should not have any effect if there already exists a 
    # handler! But it probably has under Windows:
    format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
    # This creates a StreamHandler
    logging.basicConfig(format=format, level=logging.INFO)

    # root logger logs Hello World
    logging.getLogger().info('Hello world {}'.format(no))


def main():
    format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
    # This creates a StreamHandler
    logging.basicConfig(format=format, level=logging.INFO)

    n_cores = 4
    pool = mp.Pool(n_cores)
    # Log to stdout 100 times concurrently
    pool.map(do_log, range(100))
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()

This will print something like:

ForkPoolWorker-1 root INFO     Hello world 0
ForkPoolWorker-3 root INFO     Hello world 14
ForkPoolWorker-3 root INFO     Hello world 15
ForkPoolWorker-3 root INFO     Hello world 16
...

Is this a safe set up? If not what problems can arise? Anything more serious than a garbled console output, i.e. program crash?

If it is safe, is it still safe when using mp.Process instead of mp.Pool?

EDIT: My question regards any OS, so if there are differences between Linux, OSX, or Windows, don't hesitate to tell me.

2nd EDIT: OK, so under Windows the handler disappears, what happens if we create a new StreamHandler for every process?

dano · Accepted Answer

This code will not work at all on Windows, which may or may not be a problem for you. Because Windows doesn't have fork, the logger customization you do in the parent won't get inherited properly by the child.

On Linux/OS X, the only issue will be the messages from different processes getting garbled together. The multiprocessing documentation mentions that when discussing logging:

Some support for logging is available. Note, however, that the logging package does not use process shared locks so it is possible (depending on the handler type) for messages from different processes to get mixed up.

mp.Pool is implemented using mp.Process, so they'll behave completely equivalently here.

Edit:

If you want something basically equivalent to this that will also work on Windows, you need to run the logging config in each child process, as well as in the parent:

import multiprocessing as mp
import logging


def do_log(no):
    # root logger logs Hello World
    logging.getLogger().info('Hello world {}'.format(no))

def init_log():
    fmt = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
    logging.basicConfig(format=fmt, level=logging.INFO)

def main():
    # This creates a StreamHandler
    init_log()
    n_cores = 4
    pool = mp.Pool(n_cores, initializer=init_log)
    # Log to stdout 100 times concurrently
    pool.map(do_log, range(100))
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()
    logging.getLogger().info("hi")

This gives you something that will have the same issues that the original version has on Linux (log messages will get garbled).

Can I use a StreamHandler for Logging in a Multiprocessing Environment in Python?

Answers (1)

Related Questions