Reputation: 267
I want to extract the text outside the tag and match it with the text inside the span.
This is the code:
<div class="info">
<p>
<i class="icon-trending-up"></i>
<span>Rank:</span>
600
</p>
<p>
<i class="icon-play"></i>
<span>Total Videos:</span>
36
</p>
<p>
<i class="icon-bar-chart"></i>
<span>Video Views:</span>
1,815,767
</p>
<hr>
<p>
<i class="icon-user-plus"></i>
<span>Followers:</span>
732
</p>
</div>
I want to extract something like this in separate items.
item['rank'] = rank
Rank: 600
item['videos'] = videos
Total Videos: 36
item['views'] = views
Video Views: 1,815,767
I do not want the <p> tag below <hr>
This is what i have tried by now:
rank = response.xpath("//div[@class='info']//hr/preceding-sibling::p//text()='Videos:'").extract()
This is the result:
[u'0']
OR
rank = response.xpath("//div[@class='info']//hr/preceding-sibling::p/span[contains(text(), 'Videos:')]/text()|//hr/preceding-sibling::p//text()[not(parent::span)]").extract()
This is the result:
[u' 600', u'Total Videos:', u' 36', u' 1,815,767']
Basically i want to extract The number Based on the span Text, and every <p> tag separated in it's on item.
Thank you
UPDATE
I can't use anything like p[1], p[2] etc...because those <p> may swap, or it might be only 2 on other pages. The <span> text will remain the same
Upvotes: 0
Views: 775
Reputation: 10666
What about:
item["rank"] = response.xpath('//span[.="Rank:"]/following-sibling::text()[1]').extract_first()
item["videos"] = response.xpath('//span[.="Video Views:"]/following-sibling::text()[1]').extract_first()
Upvotes: 2
Reputation: 10210
This should work. It looks a bit clumsy because it has to deal with the nested elements.
item['rank'] = ''.join(s.strip() for s in response.xpath('//div//span[contains(., "Rank:")]/ancestor::p/text()').extract())
Upvotes: 1