natnay
natnay

Reputation: 490

Python - Convert XML element and child attributes to CSV

I have XML data in this format:

<Slot_Data Timestamp="08-18-2017 07:03:20.890">
    <Slot Id="1" Count="23" Error="4" />
    <Slot Id="2" Count="31" Error="0" />
    <Slot Id="3" Count="27" Error="2" />
</Slot_Data>
<Slot_Data Timestamp="08-18-2017 07:55:54.574">
    <Slot Id="1" Count="21" Error="0" />
    <Slot Id="2" Count="23" Error="3" />
    <Slot Id="3" Count="34" Error="1" />
</Slot_Data>

I'm trying to arrange it in this format and output to CSV:

Timestamp           Slot    Count   Error
08/18/17 07:03:21   1       23      4
08/18/17 07:03:21   2       31      0
08/18/17 07:03:21   3       27      2
08/18/17 07:55:55   1       21      0
08/18/17 07:55:55   2       23      3
08/18/17 07:55:55   3       34      1

I can get the child attributes into the CSV format above (minus the Timestamp) using etree:

tree = ET.parse(xml_file)
root = tree.getroot()

for line in root.iter('Slot'):
    row = []
    id = line.get('Id')
    row.append(id)
    count = line.get('Count')
    row.append(count)
    error = line.get('Error')
    row.append(error)
    csvwriter.writerow(row)

But I can't figure out how to also append the element's timestamp. I can print them easily using etree, but I'm not sure how to work that into the above Python code. Any ideas? Thanks!

Upvotes: 0

Views: 1530

Answers (1)

eguaio
eguaio

Reputation: 3954

I think objectify module from lxml library is the way to go.

from lxml import objectify

s = '''<document><Slot_Data Timestamp="08-18-2017 07:03:20.890">
    <Slot Id="1" Count="23" Error="4" />
    <Slot Id="2" Count="31" Error="0" />
    <Slot Id="3" Count="27" Error="2" />
</Slot_Data>
<Slot_Data Timestamp="08-18-2017 07:55:54.574">
    <Slot Id="1" Count="21" Error="0" />
    <Slot Id="2" Count="23" Error="3" />
    <Slot Id="3" Count="34" Error="1" />
</Slot_Data></document>'''

mo = objectify.fromstring(s)
lines_data = [ (sd.get('Timestamp'), sl.get('Id'), sl.get('Count'), sl.get('Error'))
                  for sd in mo.Slot_Data                    
                       for sl in sd.Slot]

Notice I had to add the document tag to be able to parse the string (a root node is needed).

Now lines_data has all the data you need in a list of tuples, and you can write the data using csv library or formatting it yourself. For example:

with open('myfile.csv', 'w') as f:
    file_contents = '\n'.join( '%s,%s,%s,%s'%l for l in lines_data )
    f.write(file_contents)

Upvotes: 2

Related Questions