Reputation: 13
I need to use XPath with lxml in Python 2.6 to extract two text items:
-Name One Type 1 Description 1
-Name Two Type 2 Description 2
I've tried using the following Xpath: '//*[@id="results"]/li/div/p/child::text()' However this gives me only the following text
-Name One Type 1
-Name Two Type 2
Any suggestions on the correct Xpath to use?
<div id="container">
<ol id="results">
<li class="mod1" data-li-position="0">
<a href="first.link"><img src="image001.jpg"></a>
<div class="bd">
<h3>
<a href="some.link">Category 1</a>
</h3>
<p class="description">
<strong class="highlight">Name One</strong>
<strong class="highlight">Type 1</strong>
Description 1
</p>
</div>
</li>
<li class="mod2" data-li-position="1">
<a href="second.link"><img src="image002.jpg"></a>
<div class="bd">
<h3>
<a href="another.link">Category 2</a>
</h3>
<p class="description">
<strong class="highlight">Name Two</strong>
Description 2
<strong class="highlight">Type 2</strong>
</p>
</div>
</li>
Upvotes: 1
Views: 787
Reputation: 89325
This last part of your XPath :
...../p/child::text()
... select only text nodes which is child of child of <p>
. That's why you missed, for example, Description 1
, because it is direct child of <p>
. You can try to change that part to be as follow :
...../p//text()
Above XPath will select all text nodes which are descendants of <p>
, in other words, all text nodes anywhere within <p>
.
Upvotes: 2