Reputation: 1058
The following piece of code creates 3 or 4 log files. How should it be written so that only 1 file is created while keeping the format of the filename? Also, this seems to be a Windows only issue. When I run this on a RHEL 5.3 box even setting nproc to 1000, I only get 1 log file. [Python 3.4.1]
import datetime, logging, multiprocessing, os.path, time
logging.basicConfig(level=logging.DEBUG, format="%(asctime)-4s %(process)6s %(message)s", datefmt="%m-%d %H:%M:%S",
filename="test_%s.log"%(datetime.datetime.today().strftime("%Y%m%d-%H%M%S")))
def worker(n):
logging.info("before")
time.sleep(n)
logging.info("after")
if __name__=='__main__':
nproc = 40
pool = multiprocessing.Pool(processes=nproc)
pool.map(worker, (6,)*nproc)
Upvotes: 2
Views: 2128
Reputation: 94881
There are a couple of things you need to do:
The logging
module internally uses a threading.RLock
to synchronize writes, but this is not process-safe: each process gets its own Rlock
that knows nothing about the others. So, you need to create your own logging.Handler
that uses a process-safe lock, and then share that lock with every process in your pool. We can do this by taking advantage of the initializer
/initargs
keyword arguments to multiprocessing.Pool
; we just pass all the parameters we need to create identical loggers in all the worker processes to an initializer
function, and then build a global logging.Logger
object for each process.
Here's a working example:
import datetime, logging, multiprocessing, os.path, time
log = None
class ProcessFileHandler(logging.FileHandler):
def __init__(self, *args, **kwargs):
if 'lock' in kwargs:
self._lock = kwargs['lock']
del kwargs['lock']
else:
raise ValueError("No 'lock' keyword argument provided")
super(ProcessFileHandler, self).__init__(*args, **kwargs)
def createLock(self):
return self._lock
def setup_logging(level, filename, format, datefmt, lock):
global log # Creates a global log variable in each process
log = logging.getLogger()
handler = ProcessFileHandler(filename, lock=lock)
log.setLevel(level)
fmt = logging.Formatter(fmt=format, datefmt=datefmt)
handler.setFormatter(fmt)
log.addHandler(handler)
def worker(n):
log.info("before")
time.sleep(n)
log.info("after")
if __name__=='__main__':
nproc = 40
# initialize all the logging attributes we need
format="%(asctime)-4s %(process)6s %(message)s"
datefmt="%m-%d %H:%M:%S"
filename="test_%s.log"%(datetime.datetime.today().strftime("%Y%m%d-%H%M%S"))
level=logging.DEBUG
lock = multiprocessing.RLock()
setup_logging(level, filename, format, datefmt, lock) # Create one for this process
pool = multiprocessing.Pool(processes=nproc, initializer=setup_logging, initargs=(level, filename, format, datefmt, lock))
pool.map(worker, (6,)*nproc)
Upvotes: 0
Reputation: 3667
Original Answer: Put the logging.basicConfig
call in your if main statement.
New Answer: Add the logging call to the beginning of the worker function. You could put it into a function named setup_logging
for future changes. My previous answer probably won't work because the workers won't have logging set up.
When multiprocessing
runs a new target it has to start a whole new interpreter that then loads the module with the function to run. Where you have your logging call it will be run every time the module is loaded. I'm guessing that it only creating one file on linux is just luck.
I should note that the normal pattern for logging (at least that I use) is:
import logging
log = logging.getLogger(__name__)
# Code
def a_func():
log.debug("This is my message")
if __name__ == "__main__":
logging.basicConfig(...)
a_func()
Edit: If it wasn't clear what I meant by luck, I meant that you are creating a filename with the current date/time in it so getting multiple processes started at the same time is just luck. If an operating system takes too long to start up X number of workers then you'll get different filenames. On linux it's possible it's doing something smart to load all the interpreters/modules at the same time and then call the function.
Your code: I should apologize for writing stackoverflow answers when tired. My previous solution doesn't actually solve the problem. This does:
import datetime, logging, multiprocessing, os.path, time
def worker(args):
# it's not simple to pass more than one argument so we expand the tuple here
n, log_filename = args
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)-4s %(process)6s %(message)s",
datefmt="%m-%d %H:%M:%S",
filename=log_filename)
logging.info("before")
time.sleep(n)
logging.info("after")
if __name__=='__main__':
filename="test_%s.log" % (datetime.datetime.today().strftime("%Y%m%d-%H%M%S"),)
nproc = 40
pool = multiprocessing.Pool(processes=nproc)
pool.map(worker, ((6, filename),)*nproc)
Upvotes: 1