Reputation: 13539
I'm trying to select the ingredients in an ingredients list, but there are also tooltips scattered amongst them (on the BBC Good Food site).
As a stripped-down example:
<li class="ingredients-list__item" itemprop="ingredients">
400g
<a href="/glossary/new-potatoes" class="ingredients-list__glossary-link tooltip-processed">
new potato
<div id="gf-tooltip-0" class="gf-tooltip" role="tooltip">
<div class="gf-tooltip__content">
<div class="gf-tooltip__text">
<p>unwanted tooltip</p>
</div>
</div>
</div>
</a>, halved if large
<span class="ingredients-list__glossary-element" id="ingredients-glossary"></span>
</li>
I'm trying to end up with '400g new potato, halved if large'
, or equally good, ['400g', 'new potato', ', halved if large']
.
Amongst other things I've tried:
s.xpath("//li[@class='ingredients-list__item'][not(div[@class='gf-tooltip'])]//text()").extract()
But this still returns the text in the tooltip div.
Upvotes: 1
Views: 500
Reputation: 89325
One possible way would be excluding text nodes where any of the ancestor is a tooltip div
(broken into 2 lines for readability) :
//li[@class='ingredients-list__item']
//text()[not(ancestor::div[@class='gf-tooltip'])]
Upvotes: 3