false_azure
false_azure

Reputation: 1503

Scrapy - getting the the file size and type from a URL without downloading the file?

In Scrapy I want to crawl some pages that have large .zip files, and retrieve some data (size, url, etc.) about those files. One way I could do this is to yield requests for these urls, but I think this downloads the files. How can I get only the header from the URLs of the zips? Would it be better not to crawl the URL that I want the header from, and instead retrieve it some other way?

Upvotes: 2

Views: 1945

Answers (1)

alecxe
alecxe

Reputation: 473873

Yield requests and specify HEAD as a method:

yield Request(url, method="HEAD", callback=self.callback)

Then, in the callback read the headers from response.headers:

def callback(self, response):
    print response.headers

Upvotes: 3

Related Questions