Reputation: 3
Trying to convert XML to CSV. I am new to python parsing.
Data sample ("Dummy data")
<users>
<user firstName="Hannah" lastName="Jones" age="21" sex="Female" retired="False" dependants="2" marital_status="married or civil partner" salary="20603" pension="0" company="Ward and Sons" commute_distance="6.56" address_postcode="N06 4LG"/>
<user firstName="Tracy" lastName="Rowley" age="50" sex="Female" retired="False" dependants="1" marital_status="single" salary="39509" pension="0" company="Fuller, King and Robinson" commute_distance="11.01" address_postcode="M1 6JD"/>
<user firstName="Shane" lastName="Thompson" age="87" sex="Male" retired="True" dependants="2" marital_status="single" salary="53134" pension="13409" company="N/A" commute_distance="0" address_postcode="WF84 1EA"/>
<user firstName="Michael" lastName="Anderson" age="85" sex="Male" retired="True" dependants="2" marital_status="married or civil partner" salary="58524" pension="39479" company="N/A" commute_distance="0" address_postcode="BN1 7TL"/>
</users>
I tried this code
import csv
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["firstName", "lastName", "age", "sex", "retired", "dependants", "marital_status", "salary", "pension", "company", "commute_distance", "address_postcode"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse('/content/drive/MyDrive/DATAtask1/user_data.xml')
root = xmlparse.getroot()
for user in root:
firstName = user.find("firstName").text
lastName = user.find("lastName").text
age = user.find("age").text
sex = user.find("sex").text
retired = user.find("retired").text
dependants = user.find("dependants").text
marital_status = user.find("marital_status").text
salary = user.find("salary").text
pension = user.find("pension").text
company = user.find("company").text
commute_distance = user.find("commute_distance").text
address_postcode = user.find("address_postcode").text
rows.append({"firstName": firstName,
"lastName": lastName,
"age": age,
"sex": sex,
"retired": retired,
"dependants": dependants,
"marital_status": marital_status,
"pension": pension,
"salary": salary,
"company": company,
"commute_distance": commute_distance,
"address_postcode": address_postcode})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('/content/drive/MyDrive/DATAtask1/XMLtoCSV.csv')
Getting this error
AttributeError Traceback (most recent call last)
<ipython-input-3-c6016197ed71> in <module>()
7 root = xmlparse.getroot()
8 for user in root:
----> 9 firstName = user.find("firstName").text
10 lastName = user.find("lastName").text
11 age = user.find("age").text
AttributeError: 'NoneType' object has no attribute 'text'
Upvotes: 0
Views: 79
Reputation: 2406
The tag has an attribute with firstName. Therefore you should use:
user.attrib['firstName']
If you check: user.attrib
, it will return a dictionary (this is not true, it returns lxml.etree._Attrib which can be converted to a dictionary using (dict(user.attrib)). That will give you an opportunity to make your code easier since you can just use the dictionary like a normal python dictionary.
For example you can create a list and append all dictionaries to the list. At the end it is possible to convert a list of dictionaries to a pandas dataframe:
d1 = {'name': 'john', 'age': 19}
d2 = {'name': 'Steve', 'age': 16}
# A dictionary with an extra key:
d3 = {'name': 'Jim', 'age': 25, 'additional': 'something'}
df = pd.DataFrame([d1, d2, d3])
name age additional
0 john 19 NaN
1 Steve 16 NaN
2 Jim 25 something
Upvotes: 1