Confusing behavior of the Python logging module in a multiprocessing context

Question

So I am trying to use logging in a multiprocessing context. Say I want to have each process write its own log file. Here is my setup:

# worker.py
import logging
import os

def worker( x ) :
    """
    Write the value of x in the log file
    """
    logger = logging.getLogger( __name__ )
    pid = os.getpid() # get the process id
    handler = logging.FileHandler( str(pid) + ".log" )  
    logger.addHandler( handler )
    logger.info( "pid={pid}, x={x}".format( **locals() ) )

This is my main:

import logging
import multiprocessing as mp
import worker

# logger
logger = logging.getLogger()
logger.setLevel( logging.INFO )

# handler
handler = logging.FileHandler( 'main.log' )
logger.addHandler( handler )

#
if __name__ == '__main__' :
    pool = mp.Pool( processes=2 )
    pool.map( worker.worker, range(5) )
    pool.close()
    pool.join()

And now the output log files. This is main.log:

pid=1985, x=0
pid=1985, x=2
pid=1986, x=1
pid=1985, x=3
pid=1986, x=4

I think this output is normal: the two child processes propagate events to the root parent. So it does not correspond to the Logging to a single file from multiple processes scenario (does it?). But now here is the first child process log file:

pid=1985, x=0
pid=1985, x=2
pid=1985, x=2
pid=1985, x=3
pid=1985, x=3
pid=1985, x=3

And this is the second child log file:

pid=1986, x=1
pid=1986, x=4
pid=1986, x=4

It looks like each child process writes the log once for the first input, twice for the second, three times for the third, etc. What's going on?

dano · Accepted Answer

This is happening because every time you enter worker, you're getting a reference to the same logger object, and then add a new FileHandler to it. So if worker is called three times, its logger ends up having three FileHandlers, which all happen to write to the same file. You should use the initializer keyword argument to set up the logger in each Pool process once at startup:

def init( ) :
    logger = logging.getLogger( __name__ )
    pid = os.getpid() # get the process id
    handler = logging.FileHandler( str(pid) + ".log" )  
    logger.addHandler( handler )

Then start the pool like this

 p = multiprocessing.Pool(processes=2, initializer=init)

Then worker can just grab the logger object and use it without adding a FileHandler to it.

Confusing behavior of the Python logging module in a multiprocessing context

Answers (1)

Related Questions