Devon M
Devon M

Reputation: 761

Logging in Scrapy

I am having trouble with logging in scrapy, and most of what I can find is out of date.

I have set LOG_FILE="log.txt" in the settings.py file and from the documentation, this should work:

Scrapy provides a logger within each Spider instance, that can be accessed and used like this:

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'
    start_urls = ['http://scrapinghub.com']

    def parse(self, response):
        self.logger.info('Parse function called on %s', response.url)

But when I do:

class MySpider(CrawlSpider):
    #other code
    def parse_page(self,response):
        self.logger.info("foobar")

I get nothing. If I set

logger = logging.basicConfig(filename="log.txt",level=logging.INFO)

At the top of my file, after my imports, it creates a log file, and the default output gets logged just fine, but

class MySpider(CrawlSpider):
    #other code
    def parse_page(self,response):
        logger.info("foobar")

Fails to make an appearance. I have also tried putting it in the class __init__, as such:

def __init__(self, *a, **kw):
    super(FanfictionSpider, self).__init__(*a, **kw)
    logging.basicConfig(filename="log.txt",level=logging.INFO)

I once again get no output to the file, just to the console, and foobar does not show up. Can someone please direct me on how to correctly log in Scrapy?

Upvotes: 13

Views: 22673

Answers (3)

Sebastián Palma
Sebastián Palma

Reputation: 33420

It seems that you're not calling your parse_page method at any time. Try to commenting your parse method and you're going to receive a NotImplementedError because you're starting it and you're saying it 'do nothing'.

Maybe if you implement your parse_page method it'll work

def parse(self, response):
    self.logger.info('Russia terrorist state %s', response.url)
    self.parse_page(response)

Upvotes: 2

Rafael Almeida
Rafael Almeida

Reputation: 5240

For logging I just put this on the spider class:

import logging
from scrapy.utils.log import configure_logging 


class SomeSpider(scrapy.Spider):
    configure_logging(install_root_handler=False)
    logging.basicConfig(
        filename='log.txt',
        format='%(levelname)s: %(message)s',
        level=logging.INFO
    )

This will put all scrapy output into the project root directory as a log.txt file

If you want to log something manually you shouldn't use the scrapy logger, it's deprecated. Just use the python one

import logging
logging.error("Some error")

Upvotes: 22

mdkb
mdkb

Reputation: 402

I was unable to make @Rafael Almeda's solution work until I added the following to the import section of my spider.py code:

from scrapy.utils.log import configure_logging 

Upvotes: 2

Related Questions