ka4c
ka4c

Reputation: 99

Extract multiple xml attributes to pandas dataframe

I have a basic xml file called meals.xml which looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
    <meal id="1" name="Poached Eggs" type="breakfast"/>
    <meal id="2" name="Club Sandwich" type="lunch"/>
    <meal id="3" name="Steak" type="dinner"/>
    <meal id="4" name="Steak" type="dinner"/>
</meals>

I want to extract both 'id' and 'name' attributes in to a dataframe. I can extract one when specifying one column and one attribute (eg, name only), but can't seem to figure out the syntax for getting multiple attributes in the for loop. This what I've tried, adding id to the 'df_cols' and 'attrib.get' function:

import xml.etree.ElementTree as ET
import pandas as pd
root = ET.parse('meals.xml').getroot()

df_cols = ["id", "name"]
rows = []

for node in root:
    value = node.attrib.get('id', 'name')
    rows.append(value)
 
df = pd.DataFrame(rows, columns = df_cols)   
df

Can someone advise how to do this?

Upvotes: 1

Views: 1664

Answers (1)

balderman
balderman

Reputation: 23815

The below may work for you

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<meals name="Sample Text">
    <meal id="1" name="Poached Eggs" type="breakfast"/>
    <meal id="2" name="Club Sandwich" type="lunch"/>
    <meal id="3" name="Steak" type="dinner"/>
    <meal id="4" name="Steak" type="dinner"/>
</meals>'''
root = ET.fromstring(xml)
data = [{'id': m.attrib['id'], 'name': m.attrib['name']} for m in root.findall('.//meal')]
df = pd.DataFrame(data)
print(df)

output

  id           name
0  1   Poached Eggs
1  2  Club Sandwich
2  3          Steak
3  4          Steak

Upvotes: 3

Related Questions