Reputation: 7020
Is it safe to use a single StreamHandler in a multiprocessing environment?
More precisely, can it be problematic to have a just one StreamHandler
that simply prints the logging statements of all processes to stdout
? Like this, for example:
import multiprocessing as mp
import logging
def do_log(no):
# 2nd EDIT, suppose we do also this,
# which should not have any effect if there already exists a
# handler! But it probably has under Windows:
format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
# This creates a StreamHandler
logging.basicConfig(format=format, level=logging.INFO)
# root logger logs Hello World
logging.getLogger().info('Hello world {}'.format(no))
def main():
format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
# This creates a StreamHandler
logging.basicConfig(format=format, level=logging.INFO)
n_cores = 4
pool = mp.Pool(n_cores)
# Log to stdout 100 times concurrently
pool.map(do_log, range(100))
pool.close()
pool.join()
if __name__ == '__main__':
main()
This will print something like:
ForkPoolWorker-1 root INFO Hello world 0
ForkPoolWorker-3 root INFO Hello world 14
ForkPoolWorker-3 root INFO Hello world 15
ForkPoolWorker-3 root INFO Hello world 16
...
Is this a safe set up? If not what problems can arise? Anything more serious than a garbled console output, i.e. program crash?
If it is safe, is it still safe when using mp.Process
instead of mp.Pool
?
EDIT: My question regards any OS, so if there are differences between Linux, OSX, or Windows, don't hesitate to tell me.
2nd EDIT: OK, so under Windows the handler disappears, what happens if we create a new StreamHandler for every process?
Upvotes: 2
Views: 1179
Reputation: 94881
This code will not work at all on Windows, which may or may not be a problem for you. Because Windows doesn't have fork
, the logger customization you do in the parent won't get inherited properly by the child.
On Linux/OS X, the only issue will be the messages from different processes getting garbled together. The multiprocessing
documentation mentions that when discussing logging:
Some support for logging is available. Note, however, that the
logging
package does not use process shared locks so it is possible (depending on the handler type) for messages from different processes to get mixed up.
mp.Pool
is implemented using mp.Process
, so they'll behave completely equivalently here.
Edit:
If you want something basically equivalent to this that will also work on Windows, you need to run the logging config in each child process, as well as in the parent:
import multiprocessing as mp
import logging
def do_log(no):
# root logger logs Hello World
logging.getLogger().info('Hello world {}'.format(no))
def init_log():
fmt = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'
logging.basicConfig(format=fmt, level=logging.INFO)
def main():
# This creates a StreamHandler
init_log()
n_cores = 4
pool = mp.Pool(n_cores, initializer=init_log)
# Log to stdout 100 times concurrently
pool.map(do_log, range(100))
pool.close()
pool.join()
if __name__ == '__main__':
main()
logging.getLogger().info("hi")
This gives you something that will have the same issues that the original version has on Linux (log messages will get garbled).
Upvotes: 1