jonah_w
jonah_w

Reputation: 1032

XPath parent node's class should not contain specific string

I'm trying to find all divs whose class name is 'phrase' and parent node's class name is not 'extras'.

So in Python I'm using

for phrase in entry.iterfind(".//div[@class='phrase'] and ./parent::div[@class!='extras']]"):

to do that.

But it gives me the error:

SyntaxError: prefix 'parent' not found in prefix map

And I changed the above code to

for phrase in entry.iterfind(".//div[@class='phrase'] and ./..[@class!='extras']]"):

This time the error was

Traceback (most recent call last):File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementPath.py", line 272, in iterfind
selector = _cache[cache_key] KeyError: (".//div[@class='phrase'] and ./..[@class!='extras']]", None)

Part of the XML structures are as follows:

<div class="phrases">
    <div class="label">Phrases</div>
    <div class="phrase">
    ……

<div class="phrasal verbs">
    <div class="label">Phrases</div>
    <div class="phrase">
    ……

<div class="extras">
    <h2>test test</h2>
    <div class="phrase">
    ……

I'm using Python 3.7 and xml.etree library on Mac OS 10.14.

Upvotes: 1

Views: 759

Answers (2)

Andersson
Andersson

Reputation: 52665

Problem might be in your current tool as it might not support some XPath syntax.

You can try lxml.html to parse the same HTML-doc:

from lxml import html

source = """<div class="phrases">
                <div class="label">Phrases</div>
                <div class="phrase">this</div>
            </div>

            <div class="phrasal verbs">
                <div class="label">Phrases</div>
                <div class="phrase">this</div>
            </div>

            <div class="extras">
                <h2>test test</h2>
                <div class="phrase">not this</div>
            </div>"""

dom = html.fromstring(source)
dom.xpath(".//div[@class='phrase' and ./parent::div[@class!='extras']]")

Output:

[<Element div at 0x7fb5218d5db8>, <Element div at 0x7fb521018728>] #  Two elements found

or

dom.xpath(".//div[@class='phrase' and ./parent::div[@class!='extras']]/text()")

Output:

['this', 'this']

Upvotes: 1

Robert
Robert

Reputation: 171

you can use something like "//div[@class!='extras']/div[@class='phrase']" it should find all div's with class 'phrase' where parent class is not 'extras'

Upvotes: 0

Related Questions