Reputation: 17
I have the below xml that i would want to convert to csv ( delimiter like '|' or octal ) using python. I tried converting the xml into dict and then to csv . I am looking if there are any easy or efficient methods to do so.
Problems with the below code :
XML sample :
<?xml version="1.0" encoding="utf-8"?>
<xyz date="2019-07-01T09:00:29">
<abc>
<id>23</id>
<uniqueid>23_0</uniqueid>
<Name></Name>
<Rate>
<mrp>6.40000</mrp>
<discount>10.00%</discount>
<discountmonths>2</discountmonths>
</Rate>
<fee>
<type>off</type>
<minimumfee>£1,500.75</minimumfee>
<maxfee>£10K</maxfee>
</fee>
</abc>
<abc>
<id>35</id>
<uniqueid>35_0</uniqueid>
<Name></Name>
<Rate>
<mrp>7.90000</mrp>
<discount>5.00%</discount>
<discountmonths>5</discountmonths>
</Rate>
<fee>
<type>offer</type>
<minimumfee>£1k</minimumfee>
<maxfee>£22,000</maxfee>
</fee>
</abc>
</xyz>
Code : ( don't mind the immaturity in the code, just a beginner , mostly a copy paste from multiple sites.. trying to understand if this works and then thought i will know where to start or modify the existing )
import xml.etree.ElementTree as ET
import xmltodict
import csv
tree = ET.parse('myxml_1.xml')
xml_data = tree.getroot()
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))
with open('test1.csv','w') as f:
w = csv.writer(f)
w.writerow(data_dict.keys())
w.writerow(data_dict.values())
output expected :
id|uniqueid|Name|rate_mrp|rate_discount|rate_discountmonths|fee_type|fee_minimumfee|fee_maxfee
23|23_0||6.40000|10.00%|2|off|£1,500.75|£10K
35|35_0||7.90000|5.00%|5|offer|£1k|£22,000
Upvotes: 0
Views: 477
Reputation: 81594
I'd do this in a very explicit way rather than trying to hack xmltodict
to fit your needs.
The only downside I see with this approach is a bit of repetition with the hardcoded headers and tags names.
Also, I don't know how regular you input XML is going to be. If it's possible that some of the tags will not be present then you will need to add some error handling (because node.find
will return None
, then .text
will cause an AttributeError
).
rows = []
for abc_node in tree.findall('abc'):
rate_node = abc_node.find('Rate')
fee_node = abc_node.find('fee')
row = {'id': abc_node.find('id').text,
'uniqueid': abc_node.find('uniqueid').text,
'Name': abc_node.find('Name').text,
'rate_mrp': rate_node.find('mrp').text,
'rate_discount': rate_node.find('discount').text,
'rate_discountmonths': rate_node.find('discountmonths').text,
'fee_type': fee_node.find('type').text,
'fee_minimumfee': fee_node.find('minimumfee').text,
'fee_maxfee': fee_node.find('maxfee').text}
rows.append(row)
with open('test.csv', 'w', encoding='utf8') as f:
headers = ['id', 'uniqueid', 'Name', 'rate_mrp', 'rate_discount', 'rate_discountmonths',
'fee_type', 'fee_minimumfee', 'fee_maxfee']
dict_writer = csv.DictWriter(f, fieldnames=headers, lineterminator='\n')
dict_writer.writeheader()
dict_writer.writerows(rows)
Output
id,uniqueid,Name,rate_mrp,rate_discount,rate_discountmonths,fee_type,fee_minimumfee,fee_maxfee
23,23_0,,6.40000,10.00%,2,off,"£1,500.75",£10K
35,35_0,,7.90000,5.00%,5,offer,£1k,"£22,000"
If you want |
as delimiter just add delimiter='|'
to csv.DictWriter(f, fieldnames=headers, lineterminator='\n')
then the output is
id|uniqueid|Name|rate_mrp|rate_discount|rate_discountmonths|fee_type|fee_minimumfee|fee_maxfee
23|23_0||6.40000|10.00%|2|off|£1,500.75|£10K
35|35_0||7.90000|5.00%|5|offer|£1k|£22,000
Upvotes: 1