Jide Koso
Jide Koso

Reputation: 425

xpath selecting node with condition

Please am using Scrapy a python based framework to scrape a site but I can't figure out how to select text with the class value ellipsis ph. Sometimes with the class there is a strong tag. So far I have succeeded in extracting text without the child tag of strong .

<div class="right">
    <div class="attrs">
        <div class="attr">
            <span class="name">Main Products:</span>
                <div class="value ellipsis ph">
 // Here below i needed to select it ignoring the strong tag
                    <strong>Shoes</strong> 
                    (Sport
                    <strong>Shoes</strong>
                    ,Casual
                    <strong>Shoes</strong>
                    ,Hiking
                    <strong>Shoes</strong>
                    ,Skate
                    <strong>Shoes</strong>
                    ,Football
                    <strong>Shoes</strong>
                    )
                </div>
        </div>
    </div>
</div>


<div class="right">
    <div class="attrs">
        <div class="attr">
            <span class="name">Main Products:</span>
                <div class="value ellipsis ph">
                    Cap, Shoe, Bag // could select this

                </div>
        </div>
    </div>
</div>

From the root of selected node here is what works. to select just the text without strong tag.

"/div[@class='right']/div[@class='attrs']/div[@class='attr']/div/text()").extract()

Upvotes: 1

Views: 860

Answers (2)

GHajba
GHajba

Reputation: 3691

As @splash58 writes in the comment the

//div[@class="value ellipsis ph"]//text()

XPath gets both text contents. Naturally in the first part it is a list of texts -- however they include the text in the <strong> tags and outside of them. Because text() gets all text content inside of a subtree -- even if there are more child-tags available.

Upvotes: 2

paul trmbrth
paul trmbrth

Reputation: 20748

Assuming you want the text representation of the div elements with class value ellipsis ph, you can:

  • either select all descendant text nodes, and not only children, using .//text()
  • or make use of XPath's string functions on the div element

Here are 2 options in action:

>>> selector = scrapy.Selector(text="""<div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...  <!-- // Here below i needed to select it ignoring the strong tag -->
...                     <strong>Shoes</strong> 
...                     (Sport
...                     <strong>Shoes</strong>
...                     ,Casual
...                     <strong>Shoes</strong>
...                     ,Hiking
...                     <strong>Shoes</strong>
...                     ,Skate
...                     <strong>Shoes</strong>
...                     ,Football
...                     <strong>Shoes</strong>
...                     )
...                 </div>
...         </div>
...     </div>
... </div>
... 
... 
... <div class="right">
...     <div class="attrs">
...         <div class="attr">
...             <span class="name">Main Products:</span>
...                 <div class="value ellipsis ph">
...                     Cap, Shoe, Bag <!-- // could select this -->
... 
...                 </div>
...         </div>
...     </div>
... </div>""")
>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print "".join(div.xpath('.//text()').extract())
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> for div in selector.css('div.value.ellipsis.ph'):
...     print "---"
...     print div.xpath('string()').extract_first()
... 
---


                    Shoes 
                    (Sport
                    Shoes
                    ,Casual
                    Shoes
                    ,Hiking
                    Shoes
                    ,Skate
                    Shoes
                    ,Football
                    Shoes
                    )

---

                    Cap, Shoe, Bag 


>>> 

Upvotes: 2

Related Questions