How to get returned list from scrapy spider

Question

So I'm making a script to test my spiders but I don't know how to capture the returned data in the script that it's running the spider.

I have this return [self.p_name, self.price, self.currency] to return at the end of the spider.

In the spider tester I have this script:

#!/usr/bin/python3
#-*- coding: utf-8 -*-

# Import external libraries
import scrapy
from scrapy.crawler import CrawlerProcess
from Ruby.spiders.furla import Furla

# Import internal libraries

# Variables

t = CrawlerProcess()

def test_furla():
    x = t.crawl(Furla, url='https://www.furla.com/pt/pt/eshop/furla-sleek-BAHMW64BW000ZN98.html?dwvar_BAHMW64BW000ZN98_color=N98&cgid=SS20-Main-Collection')
    return x

test_furla()
t.start()

It's running properly the only problem is that I don't know how to catch that return at the tester side. The output from the spider is ['FURLA SLEEK', '250.00', '€'].

Tom&#225;š Linhart · Accepted Answer

If you need to access items yielded from the spider, I would probably use signals for the job, specifically item_scraped signal. Adapting your code it would like something like this:

from scrapy import signals
# other imports and stuff

t = CrawlerProcess()

def item_scraped(item, response, spider):
    # do something with the item

def test_furla():
    # we need Crawler instance to access signals
    crawler = t.create_crawler(Furla)
    crawler.signals.connect(item_scraped, signal=signals.item_scraped)
    x = t.crawl(crawler, url='https://www.furla.com/pt/pt/eshop/furla-sleek-BAHMW64BW000ZN98.html?dwvar_BAHMW64BW000ZN98_color=N98&cgid=SS20-Main-Collection')
    return x

test_furla()
t.start()

Additional info can be found in the CrawlerProcess documentation. If you on the other hand would need to work with the whole output from the spider, all the items, you would need to accumulate items using the above mechanism and work with them once crawl finishes.

How to get returned list from scrapy spider

Answers (1)

Related Questions