Reputation: 1982
I have a scrapy script that
Scrapy script:
class test_spider(XMLFeedSpider):
name='test'
start_urls=['https://www.example.com']
custom_settings={
'ITEM_PIPELINES':{
'test.test_pipe': 100,
},
}
itertag='pages'
def parse1(self,response,node):
yield Request('https://www.example.com/'+node.xpath('@id').extract_first()+'/xml-out',callback=self.parse2)
def parse2(self,response):
yield{'COLLECT1':response.xpath('/@id').extract_first()}
for text in string.split(response.xpath(root+'/node[@id="page"]/text()').extract_first() or '','^'):
if text is not '':
yield Request(
'https://www.example.com/'+text,
callback=self.parse3,
dont_filter=True
)
def parse3(self,response):
yield{'COLLECT2':response.xpath('/@id').extract_first()}
class listings_pipe(object):
def process_item(self,item,spider):
pprint(item)
Ideal result would be combined dict item such as
{'COLLECT1':'some data','COLLECT2':['some data','some data',...]}
Is there a way to call pipeline after each parse1 event? and get combined dict of items?
Upvotes: 1
Views: 1041
Reputation: 799
In your Parse2
method, use meta
and pass you collection1
to parse3
using meta
. Then in Parse3
acquire your collection1
, extract
your collection2
and yield combine result as you wish.
For more info on meta you can read here
Upvotes: 2