Reputation: 3
Here is my code:
from scrapy import *
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
class lala(CrawlSpider):
name="lala"
start_url=["http://www.lala.net/"]
rule = [Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')]
def __init__(self):
super(lala, self).__init__(self)
print "\nworking\n"
def parse(self,response):
print "\n\n Middle \n"
print "\nend\n"
The problem is:
UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
2013-04-09 13:48:25+0100 UNFORMATTABLE OBJECT WRITTEN TO LOG with fmt '[%(system)s] %(text)s\n', MESSAGE LOST
Note that both, end
and working
are printed in this case.
If I remove the init then, there is no error but the parse is not being called since the middle msg is not printed.
Upvotes: 0
Views: 147
Reputation: 7889
The scrapy documentation explicitly warns against using a CrawlSpider and overridding the parse method.
Try renaming your parse
method to something like parse_item
and trying again.
Upvotes: 1
Reputation: 1123400
You do not need to pass in self
when calling the inherited __init__()
method with super()
:
def __init__(self):
super(lala, self).__init__()
Looking at the example listed in the documentation, the attribute should be called rules
, not rule
:
class lala(CrawlSpider):
name="lala"
start_url=["http://www.lala.net/"]
rules = [
Rule(SgmlLinkExtractor(), follow=True, callback='self.parse')
]
Upvotes: 2