Anant Sangar
Anant Sangar

Reputation: 11

How to convert xml into a strucutured csv

I have the following xml data and I want to convert it into a csv:

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country2 name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country2>
    <country3 name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country3>
</data>

How can I get each country line represented as 1 row in csv with all the attributes inside each country. I can't figure out how to read multiple countries.

I have the following code:

import os
from xml.etree import ElementTree as ET



rootElement = ET.parse("/Users/testuser/Desktop/test.XML").getroot()


with open('/Users/testuser/Desktop/output.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, lineterminator='\n')


    for subelement in rootElement:
        for subsub in subelement:
            print subsub.tag
            writer.writerows(subsub.items())
            for subsubsub in subsub:
                print subsubsub.items()
                writer.writerows(subsubsub.items())

This doesn't give me the desired output unfortunatately. In my CSV I want each row to represent a country. So row 1 would be country, row 2 would be country2. And then each column should give me the value of the subelements. So in my csv A1 would be country/name, A2 would be country/rank and so on. B1 would be country2/name, B2 would be country2/rank..

currently I get the following output:

enter image description here

Upvotes: 0

Views: 111

Answers (1)

Pavan
Pavan

Reputation: 348

import csv
import xml.etree.ElementTree as ET
tree = ET.parse('/tmp/test123.xml')
root = tree.getroot()
doc = open('/tmp/output123.csv', 'w')
for child in root:
    for country in root.findall(str(child.tag)):
        rec = '%s,%s,%s' %(str(child.tag), str(country.find('rank').text), str(country.find('gdppc').text))
    for neighbor in country.findall('neighbor'):
        rec= rec+ ',' + str(neighbor.attrib['name']) +',' + str(neighbor.attrib['direction'])
    doc.write(rec + '\n')
doc.close()

output.csv

country,2,141100,Austria,E,Switzerland,W
country2,5,59900,Malaysia,N
country3,69,13600,Costa Rica,W,Colombia,E

Upvotes: 2

Related Questions