Reputation: 67
I have been trying to get Scrapy's Linkextractor to work but with no avail. I want it to find any links and then call a different method that just prints something out to show it's working.
This is my spider:
from scrapy.spiders import Rule, CrawlSpider
from scrapy.linkextractors import LinkExtractor
class TestSpider(CrawlSpider):
name = 'spi'
allowed_domains = ['https://www.reddit.com/']
start_urls = ['https://www.reddit.com/']
rules = [
Rule(LinkExtractor(allow=()),
callback='detail', follow=True)
]
def parse(self, response):
print("parsed!")
def detail(self, response):
print('parsed detail!')
When I run the spider with the command "scrapy crawl spi": I get "parsed!", so it only goes to the parse function and not to the detail method.
Upvotes: 1
Views: 1233
Reputation: 677
theres no need to comment out parse... but change to the defaults parse_item... or what ever youd like! point is, parse is a logic function already in a Crawl spider..
In the future when using "... genspider etc etc" try "scrapy genspider -t crawl SPIDERNAME BASEURL (NO http/s://www.... IE = site.com)"
Upvotes: 1
Reputation: 10210
If you are using CrawlSpider
base class for your spider, avoid using parse
method as it will break the processing. Read the warning in the documentation.
Upvotes: 3