user1136342
user1136342

Reputation: 4941

XPath: finding divs containing certain tags

I know that it allows you to find a div with an <a> tag using

'//div[a]'

but what if I want a div with both an <a> tag AND a <p> tag.

I tried doing '//div[a][p]'.

I also tried doing '//div[a|p]' which I thought would give divs with either <a> or <p> tags and then I could check if that <div> contained an <a> and a <p> later... but none of the returned divs contain a <p>, just <a>'s even though I know there are <div>s containing both.

Upvotes: 0

Views: 346

Answers (2)

toniedzwiedz
toniedzwiedz

Reputation: 18563

If you want to select only the <div> elements that have <a> and <p> as children (immediate descendants) then your XPath expressions are correct and the problem lies elsewhere.

If you mean to select <div> elements that contain <a> and <p>, you should use the descendant axis instead.

//div[descendant::a and descendant::p]

It will select all of the following <div>

<root>
  <div>
    <a>Dolor</a>
    <p>et calculum</p>
  </div>
  <div>
    <a>Dolor<p>et calculum</p></a>
  </div>
  <div>
    <ul>
      <li><a>Foo</a><li>
    </ul>
    <p>Bar</p>
  </div>
</root>

Upvotes: 2

Jon Clements
Jon Clements

Reputation: 142216

I'm sure there's a nicer way, but an immediate kludge is something like:

set(tree.xpath('//div[a]')).intersection(tree.xpath('//div[p]'))

Or this monstrosity keeping to plain XPath:

tree.xpath('//div[a][count(. | //div[p]) = count(//div[p])]')

If lxml used XPath 2.0 - then you'd have an intersect operator, but alas...

Upvotes: 1

Related Questions