Reputation: 221
I have a XML file that I am trying to convert to an Excel dataset. The XML is arranged like this:
<XML Data>
<Record>
<ID>
<Client id="01"></Client>
</ID>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="C"></Product>
</Service>
</Record>
<Record>
<ID>
<Client id="02"></Client>
</ID>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="Y"></Product>
</Service>
</Record>
<Record>
<ID>
<Client id="24"></Client>
</ID>
<Service>
<Product id="U"></Product>
</Service>
</Record>
</XML Data>
As you can see, each record shows a single client with multiple services.
I'm trying to get this done using only ElementTree. This is incorrect code that returns ALL services for each client ID--I can't figure out how to get it to return each service that a client actually had:
for x in root.findall("Record/ID/Client"):
client = x.get("id")
for y in root.findall('.//Service/Product'):
service = y.get("id")
print(client, service)
I am trying to get it arranged like this in an CSV format:
ClientID ServiceID
01 A
01 B
01 C
02 A
02 B
02 Y
24 U
Any advice would be greatly appreciated! I've looked up this, but could only find resources that show how to extract actual siblings--since the Client ID and Service ID are parents to the children I want to extract, this is proving to be a little more confusing. Thank you!
Upvotes: 1
Views: 661
Reputation: 52888
Instead of first selecting Client
, first select Record
.
Then your second for loop can be changed from root.finall
to x.findall
will only find Product
elements for the current Record
.
Example...
XML Input (test.xml; fixed invalid root element)
<XML_Data>
<Record>
<ID>
<Client id="01"></Client>
</ID>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="C"></Product>
</Service>
</Record>
<Record>
<ID>
<Client id="02"></Client>
</ID>
<Service>
<Product id="A"></Product>
<Product id="B"></Product>
<Product id="Y"></Product>
</Service>
</Record>
<Record>
<ID>
<Client id="24"></Client>
</ID>
<Service>
<Product id="U"></Product>
</Service>
</Record>
</XML_Data>
Python
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for x in root.findall("Record"):
client = x.find("ID/Client").get("id")
for y in x.findall('.//Service/Product'):
service = y.get("id")
print(client, service)
Print Output
01 A
01 B
01 C
02 A
02 B
02 Y
24 U
Upvotes: 2