freeloader
freeloader

Reputation: 73

retrieve scraped items from scrapy crawl when triggered via CrawlerRunner

I have 2 spiders in a scrapy project. They work just fine and produce the required output items.

I want to execute these spiders in a background job in a web application.

Everything is setup - a Flask app with a background job setup using Redis - frontend waits for results - all is well.

Except i can't seem to work out how to get the resulting items from the spiders when they execute.

The closest i've come seems to be the answer to this question

Get Scrapy crawler output/results in script file function

but it seems to refer to an older version of scrapy (i'm using 1.4.0) and i get the deprecation warning

'ScrapyDeprecationWarning: Importing from scrapy.xlib.pydispatch is deprecated and will no longer be supported in future Scrapy versions. If you just want to connect signals use the from_crawler class method, otherwise import pydispatch directly if needed. See: https://github.com/scrapy/scrapy/issues/1762'

checking that github issue suggests this wouldn't have worked from around v1.1.0

So, can anyone tell me how to do this now?

Upvotes: 1

Views: 480

Answers (1)

freeloader
freeloader

Reputation: 73

Turns out it's pretty easy - must have been too late at night for me.

replace

from scrapy.xlib.pydispatch import dispatcher

with

from pydispatch import dispatcher

as it clearly says in the deprecation warning

otherwise import pydispatch directly if needed.

Upvotes: 3

Related Questions