What's the best way to disable image download in scrapy?

Question

It is not disabled by default.

I have written a spider which consumes almost 2 GB of data per hour. Now I want to save my data consumption, images are of no use for me, so want to make sure they not being fetched.

Given that this is a P0 scenario, it should be a simple flag in settings.py but surprisingly from docs I couldn't find any. I found a lot of details about ImagesPipeline, enabling those pipelines, their storage etc, but no flag for people not interested in images. Let me know if I am missing anything.

Gallaecio · Accepted Answer

Scrapy does not download images unless you explicitly tell it to do it.

You can check in the run time logs the URLs that Scrapy downloads. If a image URL does not appear in the logs, it is not being downloaded, even if a webpage that contains images is downloaded.

When you open a downloaded page in a web browser, images are downloaded on the fly by the web browser. They do not come from the downloaded webpage, they are not (usually) embedded in the webpage, the webpage indicates where in the Internet they are, and the web browser downloads them to display them, but Scrapy does not.

The only exception would be that images are actually embedded in the HTML code, as base64. This is uncommon, and probably not your case. And when that happens, there is no way you can prevent their download, you cannot download a webpage excluding part of its content.

What's the best way to disable image download in scrapy?

Answers (1)

Related Questions

What&#39;s the best way to disable image download in scrapy?

Answers (1)

Related Questions

What's the best way to disable image download in scrapy?