Reputation: 17792
Here is the HTML code:
<div id="someid">
<h2>Specific text 1</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 1</a>
<a class="hyperlinks" href="link"> link2 inside specific text 1</a>
<a class="hyperlinks" href="link"> link3 inside specific text 1</a>
<h2>Specific text 2</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 2</a>
<a class="hyperlinks" href="link"> link2 inside specific text 2</a>
<a class="hyperlinks" href="link"> link3 inside specific text 2</a>
<a class="hyperlinks" href="link"> link4 inside specific text 2</a>
<h2>Specific text 3</h2>
<a class="hyperlinks" href="link"> link1 inside specific text 3</a>
<a class="hyperlinks" href="link"> link2 inside specific text 3</a>
</div>
I have to distinctly find links under each "Specific text". The problem is that if I write the following code in python:
links = root.xpath("//div[@id='someid']//a")
for link in links:
print link.attrib['href']
It prints ALL the links irrespective of "Specific Text x", Whereas I want something like:
print "link under Specific text:"+specific+" link:"+link.attrib['href']
Please suggest
Upvotes: 1
Views: 1040
Reputation: 24816
I think you will need one XPath expression for each h2 specific text.
Given an h2 specific text, you can get its following adjacent a siblings by:
//div[@id='someid']/h2[.='Specific text 1']
/following-sibling::a[
count( . | following-sibling::h2[1]/preceding-sibling::*)
= count(following-sibling::h2[1]/preceding-sibling::*)
and preceding-sibling::h2[1][.='Specific text 1']]
|
//div[@id='someid']/h2[.='Specific text 1' and not(following-sibling::h2[1])]
/following-sibling::a"
The second //h2
selection handles the case where h2 is the last one.
The expression above just exploits the XPath 1.0 intersection formula:
$ns1[count(.|$ns2)=count($ns2)]
You can find a lot of resources about this method, lot of answers here at SO (check my answers also). I think it's not difficult to understand how to apply this formula, what is difficult is to understand when it must be applied.
Credits for the formul goes to @Michael Key. Just google it a bit.
My expression has been extended with additional predicates to handle your specific case and unified (|
) with additional expression to handle last h2.
Upvotes: 1
Reputation:
You could use the starts-with(s, t)
function of XPath 2.0 to build a matching condition of a h2
-value.
//div/h2[starts-with(text(), 'Specific text')]//a
I don't know any XPath 2.0 implementation for Python. So this will probably not work. But perhaps you can change the condition for your needs.
Upvotes: 0