Reputation: 470
My practice in Python multi-threading is rather poor. So, now, I'm studying how to get log info from multiple threads. I saw a lot of different approaches, but want to start from simple one, I guess. So the task is to create several threads and log data from each of them. In order to recognize the source of logs, I want to put some custom tag in the log output. I know that logging lib has a reach LogRecord attributes (thread, threadName etc.) and it works well. So, I've got some example (logging-from-multiple-threads) and make some modification. Here is a complete code:
import logging
import threading
import time
logger = logging.getLogger()
syslog = logging.StreamHandler()
formatter = logging.Formatter('%(project)s : %(thread)x '
'%(levelname)-8s '
'%(message)s')
syslog.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(syslog)
class ContextFilter(logging.Filter):
def __init__(self, project):
super(ContextFilter, self).__init__()
self.project = project
def filter(self, record):
record.project = self.project
return True
def worker(args):
while not args['stop']:
logging.debug('Hi from {}'.format(args['project']))
time.sleep(0.5)
def main():
projects = ['project_1', 'project_2']
info = {'stop': False}
threads = []
for project in projects:
info['project'] = project
logger.addFilter(ContextFilter(project))
thread = threading.Thread(target=worker, args=(info,))
thread.start()
threads.append(thread)
while True:
try:
logging.debug('Hello from main')
time.sleep(1.75)
except KeyboardInterrupt:
info['stop'] = True
break
for t in threads:
t.join()
if __name__ == '__main__':
main()
And here are the output results:
project_2 : 7fa627e77700 DEBUG Hi from project_2
project_2 : 7fa6293d0700 DEBUG Hello from main
project_2 : 7fa627676700 DEBUG Hi from project_2
project_2 : 7fa627e77700 DEBUG Hi from project_2
project_2 : 7fa627676700 DEBUG Hi from project_2
project_2 : 7fa627e77700 DEBUG Hi from project_2
project_2 : 7fa627676700 DEBUG Hi from project_2
project_2 : 7fa627e77700 DEBUG Hi from project_2
project_2 : 7fa627676700 DEBUG Hi from project_2
project_2 : 7fa6293d0700 DEBUG Hello from main
project_2 : 7fa627e77700 DEBUG Hi from project_2
Actually, this is not what I was expecting. Can you give me some idea about what I am doing wrong?
Upvotes: 3
Views: 4232
Reputation: 824
Part of your problem comes from your passing of object variables. When you pass args=(info,)
you are passing a reference to an object (which you later modify and pass to the next object), not a copy of the object. Passing the same object to multiple threads can become dangerous, allowing for the possibility for race conditions
Firstly we can remove our ContextFilter. We were adding them to the global logger, and not keeping track of anything per thread.
import logging
import threading
import time
logger = logging.getLogger()
syslog = logging.StreamHandler()
formatter = logging.Formatter('%(project)s : %(thread)x '
'%(levelname)-8s '
'%(message)s')
syslog.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(syslog)
I find in general building threading.Thread
classes to be more useful for all but the simplest of tasks.
This class maintains its own running
state and builds its own logging adapter with the correct extra
data.
class Worker(threading.Thread):
def __init__(self, info):
self.running=False
self.info=info
self.logger=logging.LoggerAdapter(logger, self.info)
super(Worker, self).__init__()
def start(self):
self.running=True
super(Worker, self).start()
def stop(self):
self.running=False
def run(self):
while self.running:
self.logger.debug('Hi from {}'.format(self.info['project']))
time.sleep(0.5)
And now we need to change a few things. We need to use our own Worker
class.
We don't need to do anything to the logger, the class will manage its own LoggerAdapter.
We want to make sure we are creating a new info object each time, this is so simple we can just pass it straight in the function call ({'project': project}
) without assigning a variable.
We need to make sure we pass our project
variable when logging from the main thread. This is perhaps better done with another LoggerAdapter.
Once we break our loop we can ask each thread to stop, and then wait for each thread (join()
could perhaps be moved into the stop
method of the Worker
class)
def main():
projects = ['project_1', 'project_2']
threads = []
for project in projects:
thread = Worker({'project': project})
thread.start()
threads.append(thread)
while True:
try:
logging.debug('Hello from main', extra={'project':'main'})
time.sleep(1.75)
except KeyboardInterrupt:
break
for t in threads:
t.stop()
for t in threads:
t.join()
if __name__ == '__main__':
main()
This code yields results such as
project_1 : 7f4b44180700 DEBUG Hi from project_1
project_2 : 7f4b4397f700 DEBUG Hi from project_2
main : 7f4b45c8d700 DEBUG Hello from main
project_1 : 7f4b44180700 DEBUG Hi from project_1
project_2 : 7f4b4397f700 DEBUG Hi from project_2
project_1 : 7f4b44180700 DEBUG Hi from project_1
project_2 : 7f4b4397f700 DEBUG Hi from project_2
project_1 : 7f4b44180700 DEBUG Hi from project_1
project_2 : 7f4b4397f700 DEBUG Hi from project_2
main : 7f4b45c8d700 DEBUG Hello from main
project_1 : 7f4b44180700 DEBUG Hi from project_1
There are many ways the code could be tidied up and made more readable, but this should at least give you a few starting points to learn from and start experimenting. As you learn more about threads you should also read about thread synchronization mechanisms. I have recently started using Queue
s for communication between threads, leading code that's easier to debug.
Upvotes: 3
Reputation: 1642
What did you expect exactly ? If you wonder why "Hi from" does not show project name, try:
logging.debug('Hi from {}'.format(args['project']))
EDIT: to answer your comment, in fact you do get logs from both of threads.
But your info
object is shared. When you pass args=(info,))
, in fact you pass a reference to the info
object, not a copy.
So the first time the for
loop executes, you get info['project'] = "project_1"
, but the in the second time info['project']
is overwritten with "project_2".
Your worker threads just read the values from the same info
dict...
Upvotes: 2