Reputation: 589
My spider needs to be somewhat adaptable for the site I am scraping in that the info I need to fetch is at times in div[1] and at other times in div[2]. Here's an example:
item['details'] = site.select('//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()
or
item['details'] = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()').extract()
How do I combine both of these in a single statement so that scrapy fetches me from EITHER of these?
Upvotes: 2
Views: 231
Reputation: 474201
Give a try to this:
details = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()|//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s) # getting first not-empty item from the list
or
details = site.select('//*[@id="detailFacts"]/div[1]|div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s) # getting first not-empty item from the list
Hope it works for you.
Upvotes: 2