Abhishek Singh
Abhishek Singh

Reputation: 11

Scrapy doesn't callback a different function other than default 'parse'

allowed_domains = ["textfiles.com/100"]
start_urls = ['http://textfiles.com/100/']
def parse(self,response):
    link=response.css('a::attr(href)').extract()
    for i in link:
        temp="http://www.textfiles.com/100/"+i
        data=scrapy.Request(temp,callback=self.parsetwo)

The 'parsetwo' function does not get called.

def parsetwo(self,response):
    print(response.text)

Upvotes: 1

Views: 55

Answers (1)

bla
bla

Reputation: 1870

There are two problems with your current approach:

  1. Subsequent requests should be returned (or yielded) from your parse function.
  2. allowed_domains = ["textfiles.com/100"] makes all subsequent requests fail due to the fact that the domain is actually textfiles.com.

I made those two changes and got it to work.

from scrapy import Spider
from scrapy import Request


class TextCrawler(Spider):
    name = 'Text'
    allowed_domains = ['textfiles.com']
    start_urls = ['http://textfiles.com/100/']

    def parse(self, response):
        link = response.css('a::attr(href)').extract()

        for i in link:
            temp = 'http://textfiles.com/100/' + i
            yield Request(temp, callback=self.parsetwo)

    def parsetwo(self, response):
        print(response.text)

Upvotes: 1

Related Questions