Using CrawlSpider rules in Scrapy

Question

I'm trying to build a crawler that will crawl a list of sites by following all links in their first page, then repeating this for the new pages. I think I might be incorrectly using the rules attribute. The spider never calls the processor method. It seams that no links are ever followed and there are no error messages. I've omitted some of the functions to show the changes I made to add crawling. I'm using Scrapy 1.5

class Scraper(CrawlSpider):
    name = "emails"
    lx = LinkExtractor()
    rules = [Rule(link_extractor=lx, follow=True, process_links='processor', callback='landed')]

    def start_requests(self):
        self.inf = DataInterface()
        df = self.inf.searchData()

        row = df.iloc[2]
        print(row)
        #url = 'http://' + row['Website'].lower()
        #self.rules.append()
        url = 'http://example.com/Page.php?ID=7'
        req = scrapy.http.Request(url=url, callback=self.landed,
                                meta={'index': 1, 'depth': 0,
                                    'firstName': row['First Name'],
                                    'lastName': row['Last Name'],
                                    'found': {}, 'title': row['Title']})
        yield req

Using CrawlSpider rules in Scrapy

Answers (1)

Related Questions