Extract multiple xml attributes to pandas dataframe

Question

I have a basic xml file called meals.xml which looks like this:

I want to extract both 'id' and 'name' attributes in to a dataframe. I can extract one when specifying one column and one attribute (eg, name only), but can't seem to figure out the syntax for getting multiple attributes in the for loop. This what I've tried, adding id to the 'df_cols' and 'attrib.get' function:

import xml.etree.ElementTree as ET
import pandas as pd
root = ET.parse('meals.xml').getroot()

df_cols = ["id", "name"]
rows = []

for node in root:
    value = node.attrib.get('id', 'name')
    rows.append(value)
 
df = pd.DataFrame(rows, columns = df_cols)   
df

Can someone advise how to do this?

balderman · Accepted Answer

The below may work for you

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''

    
    
    
    
'''
root = ET.fromstring(xml)
data = [{'id': m.attrib['id'], 'name': m.attrib['name']} for m in root.findall('.//meal')]
df = pd.DataFrame(data)
print(df)

output

  id           name
0  1   Poached Eggs
1  2  Club Sandwich
2  3          Steak
3  4          Steak

Extract multiple xml attributes to pandas dataframe

Answers (1)

Related Questions