Milano
Milano

Reputation: 18745

Scrapy/Celery logging settings doesn't work

I'm running Scrapy spider as a Celery task.

The problem is that Scrapy doesn't log into the file scrapy.log. It logs into the Celery log which. Moreover, I can see DEBUG levels inside Celery log and I don't know if it's because of celery settings or scrapy settings.

[2017-07-17 05:49:20,848: WARNING/Process-4:1] 2017-07-17 05:49:20 [spider_1] INFO: Skipping telephone request...

[2017-07-17 05:49:22,277: DEBUG/Process-4:1] Crawled (200) https://w...

I've set Celery logging level to info:

celery worker -A realestate_scanner -l info --logfile=logs/celery.log --purge

And I've set scrapy LOG_LEVEL and LOG_FILE in SCRAPY settings.py:

LOG_LEVEL = 'INFO'
LOG_FILE = django_dir+'/logs/scrapy.log'

The scrapy project is inside one of my Django app.

This is celery.py:

from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'realestate_scanner.settings')
app = Celery('realestate_scanner')


app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

And I have Django in DEBUG mode.

Do you know what to do? How to set celery loglevel to INFO and Scrapy to log to it's own file.

EDIT:

This is how I'm running the spider:

logger = logging.getLogger(__file__)
handler = RotatingFileHandler(filename=__file__+'.log',maxBytes=64*64*64*64)
logger.addHandler(handler)

@periodic_task(run_every=timedelta(minutes=5))
def daily_scanning():
    settings = get_project_settings()
    settings.overrides['LOG_FILE']='scrapy.log'
    logger.info('Scanning started') # this is also logged into celery.log
    job = Job(TopRealitySpider1())
    Processor().run(job)
    logger.info('Scanning stopped')

settings.py (django):

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'simple': {
            'format': '%(levelname)s %(message)s',
            'datefmt': '%y %b %d, %H:%M:%S',
        },
    },
    'handlers': {
        'console': {
            'level': 'INFO',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
        'celery': {
            'level': 'INFO',
            'class': 'logging.handlers.RotatingFileHandler',
            'filename': 'logs/celery.log',
            'formatter': 'simple',
            'maxBytes': 1024 * 1024 * 100,  # 100 mb
        },
        'scrapy': {
            'level': 'DEBUG',
            'class': 'logging.handlers.TimedRotatingFileHandler',
            'filename': os.path.join(BASE_DIR, 'logs/scrapy.log'),
            'formatter': 'simple'
        }
    },
    'loggers': {
        'celery': {
            'handlers': ['celery', 'console'],
            'level': 'INFO',
        },
        'scrapy': {
            'handlers': ['scrapy'],
            'level': 'INFO',
        }
    },

}

from logging.config import dictConfig

dictConfig(LOGGING)

RUN CELERY WORKER:

celery worker -A realestate_scanner -l info --logfile=logs/celery.log --purge # When I remove --logfile, everything goes into the console

START SHELL_PLUS:

>>> tasks.daily_scanning.delay()

RESULT:

One celery.log file:

[2017-07-17 11:10:47,468: INFO/Process-1:1] Skipping telephone request...

[2017-07-17 11:10:47,468: WARNING/Process-1:1] 2017-07-17 11:10:47 [topreality_1] INFO: Skipping telephone request...

[2017-07-17 11:10:48,680: DEBUG/Process-1:1] Crawled (200)

I've tried to do this (alfonsos answer):

import logging
logger = logging.getLogger('scrapy')
...
logger.debug('something')

Which works but native scrapy messages like "Crawled" are going into celery log.

I tried to do this in spider __init__:

self.logger = logging.getLogger('scrapy') 

which returns error (can't set attribute).

So there are two problems: 1. Celery logs DEBUG messages 2. Scrapy logs into the celery.log

Upvotes: 4

Views: 2834

Answers (4)

UselessFire
UselessFire

Reputation: 93

alfonso.kim's answer doesnt work because django ignores all non UPPERCASED settings values Spent many time to debug this

So, celery will ignore uppercased settings since 6.0

But for now (version 5.2) you need to ignore celery documentation and add uppercased settings value to your settings.py:

CELERY_WORKER_HIJACK_ROOT_LOGGER = False

Celery won't override logging configuration after it

Don't forget to enable CELERY_ workspace in yourproject/celery.py:

app.config_from_object('django.conf:settings', namespace='CELERY')

After it, you can add any of celery settings to your settings.py, but only in UPPERCASE

Upvotes: 1

Mojtaba Asadi
Mojtaba Asadi

Reputation: 110

had the same issue. found out that celery captures root logger output and logs them as debug

having this as my task solved my problem

@app.task(name= "cralwer")
def crawler(spider):
    log_file = '/home/chamanmah/scrapyd/logs/pricewatch/'+spider+'/a.log'
    setting = get_project_settings()
    runner = CrawlerProcess(setting,install_root_handler=False)
    # crating file log handler
    fh =  logging.handlers.RotatingFileHandler(log_file)
    fh.setLevel(logging.DEBUG)
    # changing root log handeler that is <StreamHandler <stderr> (NOTSET)> before this line
    logging.getLogger().handlers = [fh]
    d = runner.crawl(spider)
    d.addBoth(lambda _: reactor.stop())
    reactor.run()

Upvotes: 0

alfonso.kim
alfonso.kim

Reputation: 2900

Celery is overriding the logging configuration.

Try:

CELERY_worker_hijack_root_logger=FALSE

For this to work you should enable CELERY as the namespace in yourproject/celery.py:

app.config_from_object('django.conf:settings', namespace='CELERY')

Hope this helps.

Upvotes: -1

alfonso.kim
alfonso.kim

Reputation: 2900

I don't know if this will work, but you can give it a chance.

It seems like celery is overriding the logging configuration. Since you are using it through Django, you can use it's logging capabilities:

in settings.py:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        }
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
        'scrapy': {
                'level': 'DEBUG',
            'class': 'logging.handlers.TimedRotatingFileHandler',
            'filename': os.path.join(BASE_DIR, 'logs/scrapy.log'),
            'formatter': 'simple'
        }
    },
    'loggers': {
        'celery': {
            'handlers': ['console'],
            'level': 'DEBUG',
        },
        'scrapy': {
            'handlers': ['scrapy'],
            'level': 'DEBUG',
        }
    }
}

and in scrapy:

import logging
logger = logging.getLogger('scrapy')
...
logger.debug('something')

finally, remove the logfile directive from the celery command.

Hope this helps.

Upvotes: 0

Related Questions