Reputation: 193
I have an xml file in the following format:
<table name="ecat">
<tuple>
<atom name="TaxAccNo1">test</atom>
<atom name="TaxAccNo2">00005504</atom>
<atom name="TaxAccNo3">1</atom>
<atom name="irn">100107</atom>
</tuple>
<tuple>
<atom name="TaxAccNo1">test</atom>
<atom name="TaxAccNo2">00005505</atom>
<atom name="TaxAccNo3">2</atom>
<atom name="irn">100108</atom>
</tuple>
<tuple>
<atom name="TaxAccNo1">test</atom>
<atom name="TaxAccNo2">00005506</atom>
<atom name="TaxAccNo3">3</atom>
<atom name="irn">100109</atom>
</tuple>
</table>
From this, I want to be able to generate a pandas dataframe that would look like this:
TaxAccNo1 TaxAccNo2 TaxAccNo3 irn
test 00005504 1 100107
test 00005505 2 100108
test 00005505 3 100109
How would I do this?
Upvotes: 2
Views: 752
Reputation: 375425
You can use python's xml and a list/dict comprehension:
In [11]: import xml
In [12]: e = xml.etree.ElementTree.parse(open("foo.xml"))
In [13]: pd.DataFrame([{a.get('name'): a.text for a in t} for t in e.findall("tuple")])
Out[13]:
TaxAccNo1 TaxAccNo2 TaxAccNo3 irn
0 test 00005504 1 100107
1 test 00005505 2 100108
2 test 00005506 3 100109
Upvotes: 3