hellpanderr
hellpanderr

Reputation: 5906

Scrapy callback after redirect

I have a very basic scrapy spider, which grabs urls from the file and then downloads them. The only problem is that some of them got redirected to a slightly modified url within same domain. I want to get them in my callback function using response.meta, and it works on a normal urls, but then url is redirected callback doesn't seem to get called. How can I fix it? Here's my code.

from scrapy.contrib.spiders import CrawlSpider
from scrapy import log
from scrapy import Request
class DmozSpider(CrawlSpider):
    name = "dmoz"
    handle_httpstatus_list = [302]
    allowed_domains = ["http://www.exmaple.net/"]) 
    f = open("C:\\python27\\1a.csv",'r')
    url = 'http://www.exmaple.net/Query?indx='
    start_urls = [url+row for row in f.readlines()]
    def parse(self, response):
            print response.meta.get('redirect_urls', [response.url])
            print response.status 
            print (response.headers.get('Location'))

I've also tried something like that:

def parse(self, response):
         return Request(response.url, meta={'dont_redirect': True, 'handle_httpstatus_list':     [302]}, callback=self.parse_my_url)
def parse_my_url(self, response):
        print response.status 
        print (response.headers.get('Location'))

And it doesn't work either.

Upvotes: 4

Views: 1775

Answers (1)

Tasawer Nawaz
Tasawer Nawaz

Reputation: 935

By default scrapy requests are redirected, although if you don't want to redirect you can do like this, use start_requests method and add flags in request meta.

    def start_requests(self):
        requests =[(Request(self.url+u, meta={'handle_httpstatus_list': [302],
                               'dont_redirect': True},         
                    callback=self.parse)) for u in self.start_urls]
        return requests

Upvotes: 2

Related Questions