Read xml file to dataframe in python

Question

I got a xml file need to read in python as dataframe, it is a part of the xml code:



    
        
            173915
            14712
            321255
        
        
            88942.0
            88162.0
            80553.0
        
        
            58 Middleton Street
            24 Royena Road
            9 Cafardi Boulevard
        
        
           7480000.0
           7728000.0
           7659000.0

I try some easier sample data to test my read function, they work. But when I use my function to do this xml file it only produce None in output. I think it might be col names, but I don't know how to fix it, could anyone help me? The function I used is:

import pandas as pd
import xml.etree.ElementTree as et

def parse_xml(xml_file, df_cols): 
    
    xtree = et.parse(xml_file)
    xroot = xtree.getroot()
    rows = []
    
    for node in xroot: 
        res = []
        res.append(node.attrib.get(df_cols[0]))
        for el in df_cols[1:]: 
            if node is not None and node.find(el) is not None:
                res.append(node.find(el).text)
            else: 
                res.append(None)
        rows.append({df_cols[i]: res[i] 
                     for i, _ in enumerate(df_cols)})
    
    out_df = pd.DataFrame(rows, columns=df_cols)
        
    return out_df

df_cols = ['index','property_id','addr_street','price']
parse_xml['myxmlfile.xml',df_cols]

Jonathan Leon · Accepted Answer

I think this is what you want. You should be able to put this in a function if you need

tree = et.parse('myxmlfile.xml')
root = tree.getroot()

df_cols = ['index','property_id','addr_street','price']
mlist = []
for col in df_cols:
    for d in root.findall('data'):
        # print(d.attrib)
        for c in d.findall(col):
            # print(c)
            # print(c.attrib)
            # print(c.attrib.get('id'))
            lst = []
            for itm in c.findall('item'):
                # print(itm.text)
                lst.append(itm.text)
        # print({col:lst})
        mlist.append(pd.DataFrame({col:lst}))
mlist
pd.concat(mlist, axis=1)

Output:

    index property_id          addr_street      price
0  173915     88942.0  58 Middleton Street  7480000.0
1   14712     88162.0       24 Royena Road  7728000.0
2  321255     80553.0  9 Cafardi Boulevard  7659000.0

Read xml file to dataframe in python

Answers (1)

Related Questions