Reputation: 55
I am trying to scrape webpage in Scrapy but in chunks. The goal is just to read the title of the page, not the full page.
Ex. If I have a page size 150KB but I just wanted to read the title that must be in the upper part and under 10KB of page size so if I read the first part I am gonna find the title and just cancel the remaining page scraping.
Is it possible to implement something in Scrapy so read the page in chunks?
Upvotes: 0
Views: 640
Reputation: 3857
Scrapy does not currently support stopping the reading of a response before it has finished.
You might want to monitor some of the related existing feature requests:
Provide DownloaderMiddleware an interface to read raw HTTP requests and responses
More possibilities to cancel downloads inside HTTP downloader handler
It may also make sense to create a new feature request that focuses on your scenario, since you want to stop reading a response but still get the data read so far in your callbacks, which I don’t think is covered in existing feature requests.
Upvotes: 1