Prasanta Kakati
Prasanta Kakati

Reputation: 3

scrapy prevent downloading files if already downloaded

I have created a scraper which downloads all file from a website and saves the download links in a JSON file using an item pipeline. How to prevent the scraper from downloading the same file again if its link is found in the JSON file.

Upvotes: 0

Views: 655

Answers (1)

neverlastn
neverlastn

Reputation: 2204

Great question! The fact is that what you want to do is quite complex to do programmatically in a generic way (you have to write your own middleware or to customise RFPDupeFilter here . But you are very lucky. Another generic way to achieve exactly what you want is just pausing and resuming crawls which is already implemented and tested.

Upvotes: 1

Related Questions