user818190
user818190

Reputation: 589

Scrapy xpath how to

My spider needs to be somewhat adaptable for the site I am scraping in that the info I need to fetch is at times in div[1] and at other times in div[2]. Here's an example:

item['details'] = site.select('//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()

or

item['details'] = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()').extract()

How do I combine both of these in a single statement so that scrapy fetches me from EITHER of these?

Upvotes: 2

Views: 231

Answers (1)

alecxe
alecxe

Reputation: 474201

Give a try to this:

details = site.select('//*[@id="detailFacts"]/div[1]/div[2]//text()|//*[@id="detailFacts"]/div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s)  # getting first not-empty item from the list

or

details = site.select('//*[@id="detailFacts"]/div[1]|div[2]/div[2]//text()').extract()
item['details'] = next(s for s in details if s)  # getting first not-empty item from the list

Hope it works for you.

Upvotes: 2

Related Questions