Hussar
Hussar

Reputation: 53

Using CrawlSpider rules in Scrapy

I'm trying to build a crawler that will crawl a list of sites by following all links in their first page, then repeating this for the new pages. I think I might be incorrectly using the rules attribute. The spider never calls the processor method. It seams that no links are ever followed and there are no error messages. I've omitted some of the functions to show the changes I made to add crawling. I'm using Scrapy 1.5

class Scraper(CrawlSpider):
    name = "emails"
    lx = LinkExtractor()
    rules = [Rule(link_extractor=lx, follow=True, process_links='processor', callback='landed')]

    def start_requests(self):
        self.inf = DataInterface()
        df = self.inf.searchData()

        row = df.iloc[2]
        print(row)
        #url = 'http://' + row['Website'].lower()
        #self.rules.append()
        url = 'http://example.com/Page.php?ID=7'
        req = scrapy.http.Request(url=url, callback=self.landed,
                                meta={'index': 1, 'depth': 0,
                                    'firstName': row['First Name'],
                                    'lastName': row['Last Name'],
                                    'found': {}, 'title': row['Title']})
        yield req

Upvotes: 0

Views: 65

Answers (1)

Hyperion
Hyperion

Reputation: 173

Try add after your code and change your callback to parse:

def start_requests(self):
    self.inf = DataInterface()
    df = self.inf.searchData()

    row = df.iloc[2]
    print(row)
    #url = 'http://' + row['Website'].lower()
    #self.rules.append()
    url = 'http://example.com/Page.php?ID=7'
    req = scrapy.http.Request(url=url, callback=self.parse,
                            meta={'index': 1, 'depth': 0,
                                'firstName': row['First Name'],
                                'lastName': row['Last Name'],
                                'found': {}, 'title': row['Title']})
    yield req

def parse(self, response):
    print(response.text)

Upvotes: 1

Related Questions