Scrapy - getting the the file size and type from a URL without downloading the file?

Question

In Scrapy I want to crawl some pages that have large .zip files, and retrieve some data (size, url, etc.) about those files. One way I could do this is to yield requests for these urls, but I think this downloads the files. How can I get only the header from the URLs of the zips? Would it be better not to crawl the URL that I want the header from, and instead retrieve it some other way?

alecxe · Accepted Answer

Yield requests and specify HEAD as a method:

yield Request(url, method="HEAD", callback=self.callback)

Then, in the callback read the headers from response.headers:

def callback(self, response):
    print response.headers

Scrapy - getting the the file size and type from a URL without downloading the file?

Answers (1)

Related Questions