Reputation: 602
I am using scrapy to crawl an entire site, but my parser is never getting called. I've been looking at this forever, making little changes but its not working. Maybe it just needs a fresh pair of eyes on it. Here is my code:
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class FirstSpider(CrawlSpider):
name = 'firstSpider'
allowed_domains = ['http://example.com']
start_urls = ['http://example.com']
rules = (Rule(LinkExtractor(), callback='parse_page', follow=True),)
def parse_page(self, response):
print('made it to the parser...')
I don't see any errors in the logs. The request gets a 200 response from example.com. Filtered offsite request to 'www.iana.org'.
I'm using python3 on Ubuntu 16.04.
Thanks in advance for any tips.
Upvotes: 0
Views: 163
Reputation: 146510
The issue is below
allowed_domains = ['http://example.com']
It should be domain name and not the url
allowed_domains = ['example.com']
Upvotes: 2