Reputation: 65
I'm running on Ubuntu 18.04. A Python 2 or 3 solution would be preferred. I've got xml structured like so:
<Records>
<Record>
<recID>123</recID>
<tstamp>2018-12-31T23:59:42.38Z</tstamp>
</Record>
<Record>
<recID>456</recID>
<tstamp>2018-10-10T12:03:02.28Z</tstamp>
</Record>
<Record>
<recID>789</recID>
<tstamp>2018-11-11T13:50:00.00Z</tstamp>
</Record>
</Records>
But I've got a lot of it, a single 10GB file worth. I'm looking for the most efficient way to sort the records on tstamp
, such that the output would look like this:
<Records>
<Record>
<recID>456</recID>
<tstamp>2018-10-10T12:03:02.28Z</tstamp>
</Record>
<Record>
<recID>789</recID>
<tstamp>2018-11-11T13:50:00.00Z</tstamp>
</Record>
<Record>
<recID>123</recID>
<tstamp>2018-12-31T23:59:42.38Z</tstamp>
</Record>
</Records>
Thanks in advance.
Upvotes: 0
Views: 138
Reputation: 23825
Below is a code that sort the records by 'tstamp'
import datetime
import xml.etree.ElementTree as ET
xml = '''<Records>
<Record>
<recID>123</recID>
<tstamp>2018-12-31T23:59:42.38Z</tstamp>
</Record>
<Record>
<recID>456</recID>
<tstamp>2018-10-10T12:03:02.28Z</tstamp>
</Record>
<Record>
<recID>99</recID>
<tstamp>1999-11-11T13:50:00.00Z</tstamp>
</Record>
<Record>
<recID>88</recID>
<tstamp>2020-11-11T13:50:00.00Z</tstamp>
</Record>
<Record>
<recID>789</recID>
<tstamp>2018-11-11T13:50:00.00Z</tstamp>
</Record>
<Record>
<recID>11</recID>
<tstamp>2012-11-11T13:50:00.00Z</tstamp>
</Record>
</Records>'''
root = ET.fromstring(xml)
records = root.findall('.//Record')
records = sorted(records, key=lambda r: datetime.datetime.strptime(r.find('tstamp').text[:19], '%Y-%m-%dT%H:%M:%S'))
for r in records:
print(f'{r.find("tstamp").text} -- {r.find("recID").text}')
root = ET.Element('Records')
root.extend(records)
ET.ElementTree(root).write('c:\\temp\\output.xml')
output
1999-11-11T13:50:00.00Z -- 99
2012-11-11T13:50:00.00Z -- 11
2018-10-10T12:03:02.28Z -- 456
2018-11-11T13:50:00.00Z -- 789
2018-12-31T23:59:42.38Z -- 123
2020-11-11T13:50:00.00Z -- 88
Upvotes: 1