crypterr
crypterr

Reputation: 13

Select text from multiple sub nodes in an xpath

I need to use XPath with lxml in Python 2.6 to extract two text items:

-Name One Type 1 Description 1

-Name Two Type 2 Description 2

I've tried using the following Xpath: '//*[@id="results"]/li/div/p/child::text()' However this gives me only the following text

-Name One Type 1

-Name Two Type 2

Any suggestions on the correct Xpath to use?

<div id="container">
  <ol id="results">
   <li class="mod1" data-li-position="0">
    <a href="first.link"><img src="image001.jpg"></a>
    <div class="bd">
     <h3>
      <a href="some.link">Category 1</a>
     </h3>
     <p class="description">
       <strong class="highlight">Name One</strong>
       <strong class="highlight">Type 1</strong>
       Description 1
     </p>
    </div>
   </li>
   <li class="mod2" data-li-position="1">
    <a href="second.link"><img src="image002.jpg"></a>
    <div class="bd">
     <h3>
      <a href="another.link">Category 2</a>
     </h3>
     <p class="description">
       <strong class="highlight">Name Two</strong>
       Description 2
       <strong class="highlight">Type 2</strong>
     </p>
    </div>
   </li>

Upvotes: 1

Views: 787

Answers (1)

har07
har07

Reputation: 89325

This last part of your XPath :

...../p/child::text()

... select only text nodes which is child of child of <p>. That's why you missed, for example, Description 1, because it is direct child of <p>. You can try to change that part to be as follow :

...../p//text()

Above XPath will select all text nodes which are descendants of <p>, in other words, all text nodes anywhere within <p>.

Upvotes: 2

Related Questions