Python - Converting xml to csv using Python pandas

Question

I am new in here and I have been trying to create a small python script to convert xml to csv. Based on my reading various post here in Stackoverflow I have managed to come up with a sample code that works just fine.. However the data I am trying to work with has multiple layers and thus I am unsure how to extract the data at the leaf level.

Given below is how the data looks like:

I am trying to use the below code to try converting the xml to csv

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
final = {}
for elem in root:
    if len(elem):
        for c in elem.getchildren():
            final[c.tag] = c.text
    else:
        final[elem.tag] = elem.text

df = pd.DataFrame([final])
df.to_csv('file.csv)

This code however just pulls level2 and not ColA from level6.

Expected Output:

Transmission,TransmissionBody,level1,level2,level3,level4,level5,level6,ColA,ColB
,,,,,,,,ABC,123
,,,,,,,,DEF,456

Updated code:

allFiles = glob.glob(folder)
for file in allFiles:
    xmllist = [file]
    for xmlfile in xmllist:
        tree = ET.parse(xmlfile)
        root = tree.getroot()

        def f(elem, result):
            result[elem.tag] = elem.text
            cs = elem.getchildren()
            for c in cs:
                result = f(c, result)
            return result

         d = f(root, {})
         df = pd.DataFrame(d, index=['values'])

perl · Accepted Answer

If I understood your question correctly, you need to traverse the XML tree, so you probably want to have a recursive function that does that. Something like the following:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

d = f(root, {})
df = pd.DataFrame(d, index=['values']).T
df

Out:

    values
Transmission    

TransmissionBody    

level1  

level2  

level3  

level4  

level5  

level6  

ColA    ABC
ColB    123

Update: Here's when we need to do it on multiple XML files. I've added another file similar to the original one with ColA, ColB rows replaced with

DEF
456

Here's the code:

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

result = {}
for file in glob.glob('*.xml'):
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, result)

df = pd.DataFrame(result, index=['values']).T
df

And the output:

                    0    1
Transmission       
   

TransmissionBody   
   

level1             
   

level2             
   

level3             
   

level4             
   

level5             
   

level6             
   

ColA              ABC  DEF
ColB              123  456

Python - Converting xml to csv using Python pandas

Answers (2)

Related Questions