Chris R
Chris R

Reputation: 17926

How can I capture all of the python log records generated during the execution of a series of Celery tasks?

I want to convert my homegrown task queue system into a Celery-based task queue, but one feature I currently have is causing me some distress.

Right now, my task queue operates very coarsely; I run the job (which generates data and uploads it to another server), collect the logging using a variant on Nose's log capture library, and then I store the logging for the task as a detailed result record in the application database.

I would like to break this down as three tasks:

  1. collect data
  2. upload data
  3. report results (including all logging from the preceding two tasks)

The real kicker here is the logging collection. Right now, using the log capture, I have a series of log records for each log call made during the data generation and upload process. These are required for diagnostic purposes. Given that the tasks are not even guaranteed to run in the same process, it's not clear how I would accomplish this in a Celery task queue.

My ideal solution to this problem will be a trivial and ideally minimally invasive method of capturing all logging during the predecessor tasks (1, 2) and making it available to the reporter task (3)

Am I best off remaining fairly coarse-grained with my task definition, and putting all of this work in one task? or is there a way to pass the existing captured logging around in order to collect it at the end?

Upvotes: 5

Views: 1917

Answers (3)

Alexander Lebedev
Alexander Lebedev

Reputation: 6044

I assume you are using logging module. You can use separate named logger per task set to do the job. They will inherit all configuration from upper level.

in task.py:

import logging

@task
step1(*args, **kwargs):
    # `key` is some unique identifier common for a piece of data in all steps of processing
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

@task
step2(*args, **kwargs):
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

Here, all records were sent to the same named logger. Now, you can use 2 approaches to fetch those records:

  1. Configure file listener with name that depends on logger name. After last step, just read all info from that file. Make sure output buffering is disabled for this listener or you risk loosing records.

  2. Create custom listener that would accumulate records in memory then return them all when told so. I'd use memcached for storage here, it's simpler than creating your own cross-process storage.

Upvotes: 1

Brendon Crawford
Brendon Crawford

Reputation: 1915

Django Sentry is a logging utility for Python (and Django), and has support for Celery.

Upvotes: 0

Lloyd Moore
Lloyd Moore

Reputation: 3197

It sounds like some kind of 'watcher' would be ideal. If you can watch and consume the logs as a stream you could slurp the results as they come in. Since the watcher would be running seperately and therefore have no dependencies with respect to what it is watching I believe this would satisfy your requirements for a non-invasive solution.

Upvotes: 0

Related Questions