udo
udo

Reputation: 5180

how to query xml data with namespaces using xpath in python

I am trying to apply an XPath query to XML data which has namespaces using the following code:

from lxml import etree
from io import StringIO
    
xml = '''
    <gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
     <metadata>
      <time>2015-07-07T15:16:40Z</time>
     </metadata>
     <trk>
      <name>some name</name>
      <trkseg>
       <trkpt lat="46.3884140" lon="10.0286290">
        <ele>2261.8</ele>
        <time>2015-07-07T15:30:42Z</time>
       </trkpt>
       <trkpt lat="46.3884050" lon="10.0286240">
        <ele>2261.6</ele>
        <time>2015-07-07T15:30:43Z</time>
       </trkpt>
       <trkpt lat="46.3884000" lon="10.0286210">
        <ele>2262.0</ele>
        <time>2015-07-07T15:30:46Z</time>
       </trkpt>
       <trkpt lat="46.3884000" lon="10.0286210">
        <ele>2261.8</ele>
        <time>2015-07-07T15:30:47Z</time>
       </trkpt>
      </trkseg>
     </trk>
    </gpx>
    '''
    
# this is to simulate that above xml was read from a file
file = StringIO(unicode(xml))   # with python 3 use "file = StringIO(xml)"
    
# reading the xml from a file
tree = etree.parse(file)
    
ns = {'xmlns': 'http://www.topografix.com/GPX/1/1',
      'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
      'xmlns:gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
      'xmlns:gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
    
expr = 'trk/trkseg/trkpt/ele'
    
for element in tree.xpath(expr, namespaces=ns):
    print(element.text)

I expect the following output from the code:

2261.8
2261.6
2262.0
2261.8

when you substitute the XML root element

<gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">

with

<gpx>

the code is working...

any suggestions how to get it to work with namespaces as well?

Upvotes: 1

Views: 1693

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 90889

You can define your namespaces as -

ns = {'n': 'http://www.topografix.com/GPX/1/1',
      'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
      'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
      'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}

This would define the prefix for 'http://www.topografix.com/GPX/1/1' as n , and then in your XPath query, you can use that prefix. Example -

expr = 'n:trk/n:trkseg/n:trkpt/n:ele'

for element in tree.xpath(expr, namespaces=ns):
        print(element.text)

This is because the xmlns for the root node is - 'http://www.topografix.com/GPX/1/1' - hence all the child nodes automatically inherit that as the xmlns (namespace) , unless the child node uses a different prefix or specifies an namespace of its own.

Example/Demo -

In [44]: ns = {'n': 'http://www.topografix.com/GPX/1/1',
   ....:       'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
   ....:       'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
   ....:       'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}

In [45]:

In [45]: expr = 'n:trk/n:trkseg/n:trkpt/n:ele'

In [46]: for element in tree.xpath(expr, namespaces=ns):
   ....:         print(element.text)
   ....:
2261.8
2261.6
2262.0
2261.8

Upvotes: 3

Related Questions