machin
machin

Reputation: 470

Implementing custom ouput logging from multiple threads in python

My practice in Python multi-threading is rather poor. So, now, I'm studying how to get log info from multiple threads. I saw a lot of different approaches, but want to start from simple one, I guess. So the task is to create several threads and log data from each of them. In order to recognize the source of logs, I want to put some custom tag in the log output. I know that logging lib has a reach LogRecord attributes (thread, threadName etc.) and it works well. So, I've got some example (logging-from-multiple-threads) and make some modification. Here is a complete code:

import logging
import threading
import time

logger = logging.getLogger()
syslog = logging.StreamHandler()
formatter = logging.Formatter('%(project)s : %(thread)x '
                              '%(levelname)-8s '
                              '%(message)s')
syslog.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(syslog)


class ContextFilter(logging.Filter):

    def __init__(self, project):
        super(ContextFilter, self).__init__()
        self.project = project

    def filter(self, record):
        record.project = self.project
        return True


def worker(args):
    while not args['stop']:
        logging.debug('Hi from {}'.format(args['project']))
        time.sleep(0.5)


def main():
    projects = ['project_1', 'project_2']
    info = {'stop': False}
    threads = []
    for project in projects:
        info['project'] = project
        logger.addFilter(ContextFilter(project))
        thread = threading.Thread(target=worker, args=(info,))
        thread.start()
        threads.append(thread)
    while True:
        try:
            logging.debug('Hello from main')
            time.sleep(1.75)
        except KeyboardInterrupt:
            info['stop'] = True
            break
    for t in threads:
        t.join()

if __name__ == '__main__':
    main()

And here are the output results:

project_2 : 7fa627e77700 DEBUG    Hi from project_2
project_2 : 7fa6293d0700 DEBUG    Hello from main
project_2 : 7fa627676700 DEBUG    Hi from project_2
project_2 : 7fa627e77700 DEBUG    Hi from project_2
project_2 : 7fa627676700 DEBUG    Hi from project_2
project_2 : 7fa627e77700 DEBUG    Hi from project_2
project_2 : 7fa627676700 DEBUG    Hi from project_2
project_2 : 7fa627e77700 DEBUG    Hi from project_2
project_2 : 7fa627676700 DEBUG    Hi from project_2
project_2 : 7fa6293d0700 DEBUG    Hello from main
project_2 : 7fa627e77700 DEBUG    Hi from project_2

Actually, this is not what I was expecting. Can you give me some idea about what I am doing wrong?

Upvotes: 3

Views: 4232

Answers (2)

Trevor Ian Peacock
Trevor Ian Peacock

Reputation: 824

Part of your problem comes from your passing of object variables. When you pass args=(info,) you are passing a reference to an object (which you later modify and pass to the next object), not a copy of the object. Passing the same object to multiple threads can become dangerous, allowing for the possibility for race conditions

Firstly we can remove our ContextFilter. We were adding them to the global logger, and not keeping track of anything per thread.

import logging
import threading
import time

logger = logging.getLogger()
syslog = logging.StreamHandler()
formatter = logging.Formatter('%(project)s : %(thread)x '
                              '%(levelname)-8s '
                              '%(message)s')
syslog.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(syslog)

I find in general building threading.Thread classes to be more useful for all but the simplest of tasks.

This class maintains its own running state and builds its own logging adapter with the correct extra data.

class Worker(threading.Thread):
    def __init__(self, info):
        self.running=False
        self.info=info
        self.logger=logging.LoggerAdapter(logger, self.info)
        super(Worker, self).__init__()
    def start(self):
        self.running=True
        super(Worker, self).start()
    def stop(self):
        self.running=False
    def run(self):
        while self.running:
            self.logger.debug('Hi from {}'.format(self.info['project']))
            time.sleep(0.5)

And now we need to change a few things. We need to use our own Worker class.

We don't need to do anything to the logger, the class will manage its own LoggerAdapter.

We want to make sure we are creating a new info object each time, this is so simple we can just pass it straight in the function call ({'project': project}) without assigning a variable.

We need to make sure we pass our project variable when logging from the main thread. This is perhaps better done with another LoggerAdapter.

Once we break our loop we can ask each thread to stop, and then wait for each thread (join() could perhaps be moved into the stop method of the Worker class)

def main():
    projects = ['project_1', 'project_2']
    threads = []
    for project in projects:
        thread = Worker({'project': project})
        thread.start()
        threads.append(thread)
    while True:
        try:
            logging.debug('Hello from main', extra={'project':'main'})
            time.sleep(1.75)
        except KeyboardInterrupt:
            break
    for t in threads:
        t.stop()
    for t in threads:
        t.join()

if __name__ == '__main__':
    main()

This code yields results such as

project_1 : 7f4b44180700 DEBUG    Hi from project_1
project_2 : 7f4b4397f700 DEBUG    Hi from project_2
main : 7f4b45c8d700 DEBUG    Hello from main
project_1 : 7f4b44180700 DEBUG    Hi from project_1
project_2 : 7f4b4397f700 DEBUG    Hi from project_2
project_1 : 7f4b44180700 DEBUG    Hi from project_1
project_2 : 7f4b4397f700 DEBUG    Hi from project_2
project_1 : 7f4b44180700 DEBUG    Hi from project_1
project_2 : 7f4b4397f700 DEBUG    Hi from project_2
main : 7f4b45c8d700 DEBUG    Hello from main
project_1 : 7f4b44180700 DEBUG    Hi from project_1

There are many ways the code could be tidied up and made more readable, but this should at least give you a few starting points to learn from and start experimenting. As you learn more about threads you should also read about thread synchronization mechanisms. I have recently started using Queues for communication between threads, leading code that's easier to debug.

Upvotes: 3

Stephane Martin
Stephane Martin

Reputation: 1642

What did you expect exactly ? If you wonder why "Hi from" does not show project name, try:

logging.debug('Hi from {}'.format(args['project']))

EDIT: to answer your comment, in fact you do get logs from both of threads. But your infoobject is shared. When you pass args=(info,)), in fact you pass a reference to the info object, not a copy.

So the first time the for loop executes, you get info['project'] = "project_1", but the in the second time info['project'] is overwritten with "project_2".

Your worker threads just read the values from the same info dict...

Upvotes: 2

Related Questions