Reputation: 425
Please am using Scrapy a python based framework to scrape a site but I can't figure out how to select text with the class value ellipsis ph
. Sometimes with the class there is a strong tag. So far I have succeeded in extracting text without the child tag of strong
.
<div class="right">
<div class="attrs">
<div class="attr">
<span class="name">Main Products:</span>
<div class="value ellipsis ph">
// Here below i needed to select it ignoring the strong tag
<strong>Shoes</strong>
(Sport
<strong>Shoes</strong>
,Casual
<strong>Shoes</strong>
,Hiking
<strong>Shoes</strong>
,Skate
<strong>Shoes</strong>
,Football
<strong>Shoes</strong>
)
</div>
</div>
</div>
</div>
<div class="right">
<div class="attrs">
<div class="attr">
<span class="name">Main Products:</span>
<div class="value ellipsis ph">
Cap, Shoe, Bag // could select this
</div>
</div>
</div>
</div>
From the root of selected node here is what works. to select just the text without strong tag.
"/div[@class='right']/div[@class='attrs']/div[@class='attr']/div/text()").extract()
Upvotes: 1
Views: 860
Reputation: 3691
As @splash58 writes in the comment the
//div[@class="value ellipsis ph"]//text()
XPath gets both text contents. Naturally in the first part it is a list of texts -- however they include the text in the <strong>
tags and outside of them. Because text()
gets all text content inside of a subtree -- even if there are more child-tags available.
Upvotes: 2
Reputation: 20748
Assuming you want the text representation of the div
elements with class value ellipsis ph
, you can:
.//text()
div
elementHere are 2 options in action:
>>> selector = scrapy.Selector(text="""<div class="right">
... <div class="attrs">
... <div class="attr">
... <span class="name">Main Products:</span>
... <div class="value ellipsis ph">
... <!-- // Here below i needed to select it ignoring the strong tag -->
... <strong>Shoes</strong>
... (Sport
... <strong>Shoes</strong>
... ,Casual
... <strong>Shoes</strong>
... ,Hiking
... <strong>Shoes</strong>
... ,Skate
... <strong>Shoes</strong>
... ,Football
... <strong>Shoes</strong>
... )
... </div>
... </div>
... </div>
... </div>
...
...
... <div class="right">
... <div class="attrs">
... <div class="attr">
... <span class="name">Main Products:</span>
... <div class="value ellipsis ph">
... Cap, Shoe, Bag <!-- // could select this -->
...
... </div>
... </div>
... </div>
... </div>""")
>>> for div in selector.css('div.value.ellipsis.ph'):
... print "---"
... print "".join(div.xpath('.//text()').extract())
...
---
Shoes
(Sport
Shoes
,Casual
Shoes
,Hiking
Shoes
,Skate
Shoes
,Football
Shoes
)
---
Cap, Shoe, Bag
>>> for div in selector.css('div.value.ellipsis.ph'):
... print "---"
... print div.xpath('string()').extract_first()
...
---
Shoes
(Sport
Shoes
,Casual
Shoes
,Hiking
Shoes
,Skate
Shoes
,Football
Shoes
)
---
Cap, Shoe, Bag
>>>
Upvotes: 2