Reputation: 4941
I know that it allows you to find a div with an <a>
tag using
'//div[a]'
but what if I want a div with both an <a>
tag AND a <p>
tag.
I tried doing '//div[a][p]'
.
I also tried doing '//div[a|p]'
which I thought would give divs with either <a>
or <p>
tags and then I could check if that <div>
contained an <a>
and a <p>
later... but none of the returned divs contain a <p>
, just <a>
's even though I know there are <div>
s containing both.
Upvotes: 0
Views: 346
Reputation: 18563
If you want to select only the <div>
elements that have <a>
and <p>
as children (immediate descendants) then your XPath expressions are correct and the problem lies elsewhere.
If you mean to select <div>
elements that contain <a>
and <p>
, you should use the descendant
axis instead.
//div[descendant::a and descendant::p]
It will select all of the following <div>
<root>
<div>
<a>Dolor</a>
<p>et calculum</p>
</div>
<div>
<a>Dolor<p>et calculum</p></a>
</div>
<div>
<ul>
<li><a>Foo</a><li>
</ul>
<p>Bar</p>
</div>
</root>
Upvotes: 2
Reputation: 142216
I'm sure there's a nicer way, but an immediate kludge is something like:
set(tree.xpath('//div[a]')).intersection(tree.xpath('//div[p]'))
Or this monstrosity keeping to plain XPath:
tree.xpath('//div[a][count(. | //div[p]) = count(//div[p])]')
If lxml
used XPath 2.0 - then you'd have an intersect operator, but alas...
Upvotes: 1