Parse xml data with multiple roots in python

Question

I'm making an API call that returns multiple xml responses as so-

I want to parse all the action IDs from the tag and add them to a list-

import xml.etree.ElementTree as ET
url = ""
payload = ""
headers = {}
response = requests.post(url, headers=headers, data=payload)

root = ET.fromstring(response.content)
actionidlist = []
for elem in root.iter('Action'):
    for subelem in elem.iter('ID'):
        actionidlist.append(subelem.text)
        print(actionidlist)

I get errors though because there are multiple roots. How do I parse this?

Edit: By errors I mean, actionidlist seems to only contain the last ID and not the rest of the IDs.

joao · Accepted Answer

ET.fromstring() only parses one XML section, if you try to parse your entire input data, with multiple roots, you get the error:

xml.etree.ElementTree.ParseError: junk after document element: line 9, column 0

So I suggest pre-processing the input data, to split it into a list of xml responses, then parse each one in turn:

import xml.etree.ElementTree as ET
url = ""
payload = ""
headers = {}
response = requests.post(url, headers=headers, data=payload)

# Split the input data into a list of strings (xml sections)
xml_sections = ['']
for line in response.content.splitlines():
    if len(line) != 0:
        xml_sections[-1] += line + '
'
    else:
        xml_sections.append('')

# Parse each XML section separately
actionidlist = []
for s in xml_sections:
    root = ET.fromstring(s)
    for elem in root.iter('Action'):
        for subelem in elem.iter('ID'):
            actionidlist.append(subelem.text)
print(actionidlist)

This produces the following output:

[' 123 ', ' 456 ', ' 789 ']

Parse xml data with multiple roots in python

Answers (2)

Related Questions