beoliver
beoliver

Reputation: 5769

python xml extraction for loop

I have a bit of script that I think is nearly there. I have worked out a crude way of writing it, but I can't work out how to get it to function as a for loop.

I am extracting data from an xml file that uses the following format:

<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>

I can use the following to get say the LatitudeDegrees:

from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for x in lat:
    print(x.firstChild.data)

but I would like to get the Lat, Long and time in order.

I am guessing I need to use

for x in trackpoint 

but the only way I can work out how to do that is as follows.

count = 0
n = len(trackpoint)
while count < n:
    print(time[count].firstChild.data)
    print(lat[count].firstChild.data)
    print(lon[count].firstChild.data)
    count += 1

anyone have any ideas? I think I am just missing something really simple!

Upvotes: 1

Views: 1777

Answers (3)

Anurag Uniyal
Anurag Uniyal

Reputation: 88865

I usually found parsing xml using ElementTree more readable and easier e.g. you can read latitude in three lines

import xml.etree.ElementTree as etree

s="""<root>
<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
</root>
"""

root = etree.fromstring(s)
for point in root:
    print point.find('Position/LatitudeDegrees').text

so suppose you want to convert each point to a dict

varnames = [
    ('Position/LatitudeDegrees', 'lat'),
    ('Position/LongitudeDegrees', 'lon'),
    ('Time', 'time'),
    ('AltitudeMeters', 'alt')
    ]

points = []
for pointelem in etree.fromstring(s):
    point = {}
    for tag, varname in varnames:
        point[varname] = pointelem.find(tag).text
    points.append(point)

import pprint
pprint.pprint(points)

output:

[{'alt': '-43.6026611328125',
  'lat': '51.920211518183351',
  'lon': '26.706042898818851',
  'time': '2012-01-17T11:44:35Z'},
 {'alt': '-43.6026611328125',
  'lat': '51.920243117958307',
  'lon': '26.706140967085958',
  'time': '2012-01-17T11:45:21Z'}]

Upvotes: 2

unutbu
unutbu

Reputation: 880987

Perhaps you are looking for zip:

import xml.dom.minidom as minidom
import os

doc = minidom.parse(os.path.expanduser('~/test/gps.tcx'))
latitudes = doc.getElementsByTagName("LatitudeDegrees")
longitudes = doc.getElementsByTagName("LongitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for t,lat,lon in zip(time,latitudes,longitudes):
    print(t.firstChild.data, lat.firstChild.data, lon.firstChild.data)

Upvotes: 0

Rob Wouters
Rob Wouters

Reputation: 16327

First find all the Trackpoint elements and loop over them. Then inside the loop find the wanted childelements of each Trackpoint element:

from xml.dom.minidom import parse

doc = parse('in.tcx')

trackpoints = doc.getElementsByTagName("Trackpoint")
result = []
elements = ('Time', 'LatitudeDegrees', 'LongitudeDegrees')
for tp in trackpoints:
    obj = {}
    for el in elements:
        obj[el] = tp.getElementsByTagName(el)[0].firstChild.data
    result.append(obj)


print(result)

Upvotes: 4

Related Questions