JustAStrawberry
JustAStrawberry

Reputation: 3

Unformattable object error

Here is my code:

from scrapy import * 
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector 
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor

class lala(CrawlSpider):
    name="lala"
    start_url=["http://www.lala.net/"]       
    rule = [Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')] 

    def __init__(self):
        super(lala, self).__init__(self)    
        print "\nworking\n"

    def parse(self,response):        
        print "\n\n Middle \n"  

print "\nend\n"

The problem is:

UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST

Note that both, end and working are printed in this case.

If I remove the init then, there is no error but the parse is not being called since the middle msg is not printed.

Upvotes: 0

Views: 147

Answers (2)

Talvalin
Talvalin

Reputation: 7889

The scrapy documentation explicitly warns against using a CrawlSpider and overridding the parse method.

Try renaming your parse method to something like parse_item and trying again.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1123400

You do not need to pass in self when calling the inherited __init__() method with super():

def __init__(self):
    super(lala, self).__init__()    

Looking at the example listed in the documentation, the attribute should be called rules, not rule:

class lala(CrawlSpider):
    name="lala"
    start_url=["http://www.lala.net/"]       
    rules = [
        Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')
    ] 

Upvotes: 2

Related Questions