ironv
ironv

Reputation: 1058

logger creating multiple files

The following piece of code creates 3 or 4 log files. How should it be written so that only 1 file is created while keeping the format of the filename? Also, this seems to be a Windows only issue. When I run this on a RHEL 5.3 box even setting nproc to 1000, I only get 1 log file. [Python 3.4.1]

import datetime, logging, multiprocessing, os.path, time

logging.basicConfig(level=logging.DEBUG, format="%(asctime)-4s %(process)6s  %(message)s", datefmt="%m-%d %H:%M:%S",
    filename="test_%s.log"%(datetime.datetime.today().strftime("%Y%m%d-%H%M%S")))

def worker(n):
    logging.info("before")
    time.sleep(n)
    logging.info("after")

if __name__=='__main__':
    nproc = 40
    pool = multiprocessing.Pool(processes=nproc)
    pool.map(worker, (6,)*nproc)

Upvotes: 2

Views: 2128

Answers (2)

dano
dano

Reputation: 94881

There are a couple of things you need to do:

  1. Make sure you create the name of the file once, and share it across all processes. daveydave400 covered this in his answer.
  2. Make sure write operations across all processes are synchronized. If you don't do this, your worker processes will try to write to the log concurrently, which can result in one write stomping on another. Avoiding this was not covered in the other answer.

The logging module internally uses a threading.RLock to synchronize writes, but this is not process-safe: each process gets its own Rlock that knows nothing about the others. So, you need to create your own logging.Handler that uses a process-safe lock, and then share that lock with every process in your pool. We can do this by taking advantage of the initializer/initargs keyword arguments to multiprocessing.Pool; we just pass all the parameters we need to create identical loggers in all the worker processes to an initializer function, and then build a global logging.Logger object for each process.

Here's a working example:

import datetime, logging, multiprocessing, os.path, time

log = None
class ProcessFileHandler(logging.FileHandler):
    def __init__(self, *args, **kwargs):
        if 'lock' in kwargs:
            self._lock = kwargs['lock']
            del kwargs['lock']
        else:
            raise ValueError("No 'lock' keyword argument provided")
        super(ProcessFileHandler, self).__init__(*args, **kwargs)

    def createLock(self):
        return self._lock

def setup_logging(level, filename, format, datefmt, lock):
    global log  # Creates a global log variable in each process
    log = logging.getLogger()
    handler = ProcessFileHandler(filename, lock=lock)
    log.setLevel(level)
    fmt = logging.Formatter(fmt=format, datefmt=datefmt)
    handler.setFormatter(fmt)
    log.addHandler(handler)

def worker(n):
    log.info("before")
    time.sleep(n)
    log.info("after")

if __name__=='__main__':
    nproc = 40
    # initialize all the logging attributes we need
    format="%(asctime)-4s %(process)6s  %(message)s"
    datefmt="%m-%d %H:%M:%S"
    filename="test_%s.log"%(datetime.datetime.today().strftime("%Y%m%d-%H%M%S"))
    level=logging.DEBUG
    lock = multiprocessing.RLock()
    setup_logging(level, filename, format, datefmt, lock)  # Create one for this process

    pool = multiprocessing.Pool(processes=nproc, initializer=setup_logging, initargs=(level, filename, format, datefmt, lock))
    pool.map(worker, (6,)*nproc)

Upvotes: 0

djhoese
djhoese

Reputation: 3667

Original Answer: Put the logging.basicConfig call in your if main statement.

New Answer: Add the logging call to the beginning of the worker function. You could put it into a function named setup_logging for future changes. My previous answer probably won't work because the workers won't have logging set up.

When multiprocessing runs a new target it has to start a whole new interpreter that then loads the module with the function to run. Where you have your logging call it will be run every time the module is loaded. I'm guessing that it only creating one file on linux is just luck.

I should note that the normal pattern for logging (at least that I use) is:

import logging

log = logging.getLogger(__name__)

# Code
def a_func():
    log.debug("This is my message")

if __name__ == "__main__":
    logging.basicConfig(...)
    a_func()

Edit: If it wasn't clear what I meant by luck, I meant that you are creating a filename with the current date/time in it so getting multiple processes started at the same time is just luck. If an operating system takes too long to start up X number of workers then you'll get different filenames. On linux it's possible it's doing something smart to load all the interpreters/modules at the same time and then call the function.

Your code: I should apologize for writing stackoverflow answers when tired. My previous solution doesn't actually solve the problem. This does:

import datetime, logging, multiprocessing, os.path, time

def worker(args):
    # it's not simple to pass more than one argument so we expand the tuple here
    n, log_filename = args
    logging.basicConfig(level=logging.DEBUG,
            format="%(asctime)-4s %(process)6s  %(message)s",
            datefmt="%m-%d %H:%M:%S",
            filename=log_filename)

    logging.info("before")
    time.sleep(n)
    logging.info("after")

if __name__=='__main__':
    filename="test_%s.log" % (datetime.datetime.today().strftime("%Y%m%d-%H%M%S"),)
    nproc = 40
    pool = multiprocessing.Pool(processes=nproc)
    pool.map(worker, ((6, filename),)*nproc)

Upvotes: 1

Related Questions