Reputation: 888
So I'm making a script to test my spiders but I don't know how to capture the returned data in the script that it's running the spider.
I have this return [self.p_name, self.price, self.currency]
to return at the end of the spider.
In the spider tester I have this script:
#!/usr/bin/python3
#-*- coding: utf-8 -*-
# Import external libraries
import scrapy
from scrapy.crawler import CrawlerProcess
from Ruby.spiders.furla import Furla
# Import internal libraries
# Variables
t = CrawlerProcess()
def test_furla():
x = t.crawl(Furla, url='https://www.furla.com/pt/pt/eshop/furla-sleek-BAHMW64BW000ZN98.html?dwvar_BAHMW64BW000ZN98_color=N98&cgid=SS20-Main-Collection')
return x
test_furla()
t.start()
It's running properly the only problem is that I don't know how to catch that return at the tester side. The output from the spider is ['FURLA SLEEK', '250.00', '€']
.
Upvotes: 1
Views: 914
Reputation: 10220
If you need to access items yielded from the spider, I would probably use signals for the job, specifically item_scraped
signal. Adapting your code it would like something like this:
from scrapy import signals
# other imports and stuff
t = CrawlerProcess()
def item_scraped(item, response, spider):
# do something with the item
def test_furla():
# we need Crawler instance to access signals
crawler = t.create_crawler(Furla)
crawler.signals.connect(item_scraped, signal=signals.item_scraped)
x = t.crawl(crawler, url='https://www.furla.com/pt/pt/eshop/furla-sleek-BAHMW64BW000ZN98.html?dwvar_BAHMW64BW000ZN98_color=N98&cgid=SS20-Main-Collection')
return x
test_furla()
t.start()
Additional info can be found in the CrawlerProcess
documentation. If you on the other hand would need to work with the whole output from the spider, all the items, you would need to accumulate items using the above mechanism and work with them once crawl finishes.
Upvotes: 2