Reputation: 1503
In Scrapy I want to crawl some pages that have large .zip files, and retrieve some data (size, url, etc.) about those files. One way I could do this is to yield requests for these urls, but I think this downloads the files. How can I get only the header from the URLs of the zips? Would it be better not to crawl the URL that I want the header from, and instead retrieve it some other way?
Upvotes: 2
Views: 1945
Reputation: 473873
Yield requests and specify HEAD
as a method:
yield Request(url, method="HEAD", callback=self.callback)
Then, in the callback read the headers from response.headers
:
def callback(self, response):
print response.headers
Upvotes: 3