Chrestomanci
Chrestomanci

Reputation: 221

ElementTree Python: How to extract siblings if siblings are nested?

I have a XML file that I am trying to convert to an Excel dataset. The XML is arranged like this:

<XML Data>
    <Record>
        <ID>
            <Client id="01"></Client>
        </ID>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="C"></Product>
        </Service>
    </Record>
    <Record>
        <ID>
            <Client id="02"></Client>
        </ID>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="Y"></Product>
        </Service>
    </Record>
    <Record>
        <ID>
            <Client id="24"></Client>
        </ID>
        <Service>
            <Product id="U"></Product>
        </Service>
    </Record>
</XML Data>

As you can see, each record shows a single client with multiple services.

I'm trying to get this done using only ElementTree. This is incorrect code that returns ALL services for each client ID--I can't figure out how to get it to return each service that a client actually had:

for x in root.findall("Record/ID/Client"):
    client = x.get("id")
    for y in root.findall('.//Service/Product'):
        service = y.get("id")
        print(client, service)

I am trying to get it arranged like this in an CSV format:

ClientID    ServiceID
01          A
01          B
01          C
02          A
02          B
02          Y
24          U

Any advice would be greatly appreciated! I've looked up this, but could only find resources that show how to extract actual siblings--since the Client ID and Service ID are parents to the children I want to extract, this is proving to be a little more confusing. Thank you!

Upvotes: 1

Views: 661

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52888

Instead of first selecting Client, first select Record.

Then your second for loop can be changed from root.finall to x.findall will only find Product elements for the current Record.

Example...

XML Input (test.xml; fixed invalid root element)

<XML_Data>
    <Record>
        <ID>
            <Client id="01"></Client>
        </ID>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="C"></Product>
        </Service>
    </Record>
    <Record>
        <ID>
            <Client id="02"></Client>
        </ID>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="Y"></Product>
        </Service>
    </Record>
    <Record>
        <ID>
            <Client id="24"></Client>
        </ID>
        <Service>
            <Product id="U"></Product>
        </Service>
    </Record>
</XML_Data>

Python

import xml.etree.ElementTree as ET

tree = ET.parse('test.xml')

root = tree.getroot()

for x in root.findall("Record"):
    client = x.find("ID/Client").get("id")
    for y in x.findall('.//Service/Product'):
        service = y.get("id")
        print(client, service)

Print Output

01 A
01 B
01 C
02 A
02 B
02 Y
24 U

Upvotes: 2

Related Questions