Reputation: 827
I have an XML file to parse, and I need to find elements by id.
In the example code, I need to find the name of the driver
, but I don't know whether my id is for the vehicle
, engine
, or block
. I would like a solution which would work with arbitrary xml inside of vehicle
(but existence of driver
is guaranteed).
<road>
<vehicle id="16">
<driver>Bob Johnson</driver>
<engine id="532">
<type>V8</type>
<block id="113">
<material>Aluminium</material>
</block>
</engine>
</vehicle>
<vehicle id="452">
<driver>Dave Edwards</driver>
<engine id="212">
<type>Inline 6</type>
<block id="381">
<material>Cast Iron</material>
</block>
</engine>
</vehicle>
</road>
What have I tried
I was trying to get the elements by their id, and then, if they weren't vehicle
tags, navigate up the tree to find it, but it seems python's elem.find()
returns None if the result is outside elem
.
Looking at the docs, they have this example:
# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")
But I don't see how to make that work for any descendant, as opposed to a decendant on a specific level.
Upvotes: 3
Views: 4144
Reputation: 44444
Note: All the snippets below use lxml
library. To install, run: pip install lxml
.
You should use root.xpath(..)
not root.findall(..)
.
>>> root.xpath("//vehicle/driver/text()")
['Bob Johnson', 'Dave Edwards']
If you want to extract driver's name from a given ID, you'd do:
>>> vehicle_id = "16"
>>> xpath("//vehicle[@id='16' or .//*[@id='16']]/driver/text()")
['Bob Johnson']
UPDATE: To get the driver's name for a given id
nested at any level deeper, you'd do:
>>> i = '16'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '532'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '113'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
Upvotes: 2
Reputation: 473833
If you know the id
, but don't know if this id
is from vehicle, engine or block, you can approach it with an XPath expression, but you would have to use lxml.etree
instead of xml.etree.ElementTree
(it has very limited XPath support). Use the ancestor-or-self
axis:
input_id = "your ID"
print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)
This would print:
Bob Johnson
if input_id
would be 16
or 532
or 113
Dave Edwards
if input_id
would be 452
or 212
or 381
Complete working example:
import lxml.etree as ET
data = """
<road>
<vehicle id="16">
<driver>Bob Johnson</driver>
<engine id="532">
<type>V8</type>
<block id="113">
<material>Aluminium</material>
</block>
</engine>
</vehicle>
<vehicle id="452">
<driver>Dave Edwards</driver>
<engine id="212">
<type>Inline 6</type>
<block id="381">
<material>Cast Iron</material>
</block>
</engine>
</vehicle>
</road>
"""
root = ET.fromstring(data)
for input_id in [16, 532, 113, 452, 212, 381]:
print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)
Prints:
Bob Johnson
Bob Johnson
Bob Johnson
Dave Edwards
Dave Edwards
Dave Edwards
Upvotes: 1