Reputation: 57651
This question is essentially the same as Pass scraped URL's from one spider to another, but I'd like to double-check whether there is no 'Scrapy-native' way to do this.
I'm scraping web pages which 99% of the time can be scraped successfully without rendering JavaScript. Sometimes, however, this fails and certain Field
s are not present. I'd like to write a Scrapy Extension with an item_scraped
method which checks if all expected fields are populated and if not, yield a SplashRequest
to a different spider with custom_settings
including the Splash settings (cf. https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/).
Is there any Scrapy way to do this without using an external service (like Redis)?
Upvotes: 1
Views: 394
Reputation: 22238
Enabling scrapy-splash only makes SplashRequest work, it does not affect regular scrapy.Request (if there is no 'splash' in request.meta).
You can include Splash settings and still yield scrapy.Request - they will be processed without Splash.
Upvotes: 4