nHaskins
nHaskins

Reputation: 827

Get etree Element with attribute, or containing subelement with attribute

I have an XML file to parse, and I need to find elements by id.

In the example code, I need to find the name of the driver, but I don't know whether my id is for the vehicle, engine, or block. I would like a solution which would work with arbitrary xml inside of vehicle (but existence of driver is guaranteed).

<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>

What have I tried

I was trying to get the elements by their id, and then, if they weren't vehicle tags, navigate up the tree to find it, but it seems python's elem.find() returns None if the result is outside elem.

Looking at the docs, they have this example:

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

But I don't see how to make that work for any descendant, as opposed to a decendant on a specific level.

Upvotes: 3

Views: 4144

Answers (2)

UltraInstinct
UltraInstinct

Reputation: 44444

Note: All the snippets below use lxml library. To install, run: pip install lxml.

You should use root.xpath(..) not root.findall(..).

>>> root.xpath("//vehicle/driver/text()")
['Bob Johnson', 'Dave Edwards']

If you want to extract driver's name from a given ID, you'd do:

>>> vehicle_id = "16"
>>> xpath("//vehicle[@id='16' or .//*[@id='16']]/driver/text()")
['Bob Johnson']

UPDATE: To get the driver's name for a given id nested at any level deeper, you'd do:

>>> i = '16'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '532'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '113'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']

Upvotes: 2

alecxe
alecxe

Reputation: 473833

If you know the id, but don't know if this id is from vehicle, engine or block, you can approach it with an XPath expression, but you would have to use lxml.etree instead of xml.etree.ElementTree (it has very limited XPath support). Use the ancestor-or-self axis:

input_id = "your ID"
print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

This would print:

  • Bob Johnson if input_id would be 16 or 532 or 113
  • Dave Edwards if input_id would be 452 or 212 or 381

Complete working example:

import lxml.etree as ET

data = """
<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>
"""

root = ET.fromstring(data)
for input_id in [16, 532, 113, 452, 212, 381]:
    print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

Prints:

Bob Johnson
Bob Johnson
Bob Johnson
Dave Edwards
Dave Edwards
Dave Edwards

Upvotes: 1

Related Questions