Reputation: 9
I'm trying to take an API call response and parse the XML data into list, but I am struggling with the multiple child/parent relationships.
My hope is to export a new XML file that would line up each job ID and tracking number, which I could then import into Excel.
The source XML file looks like this:
<project>
<name>October 2019</name>
<jobs>
<job>
<id>5654206</id>
<tracking>
<mailPiece>
<barCode>00270200802095682022</barCode>
<address>Accounts Payable,1661 Knott Ave,La Mirada,CA,90638</address>
<status>En Route</status>
<dateTime>2019-10-12 00:04:21.0</dateTime>
<statusLocation>PONTIAC,MI</statusLocation>
</mailPiece>
</tracking>...
Code:
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
tree = ET.parse('mailings.xml')
root = tree.getroot()
print(root.tag)
for x in root[1].findall('job'):
id=x.find('id').text
tracking=x.find('tracking').text
print(root[1].tag,id,tracking)
The script currently returns the following:
jobs 5654206 None
jobs 5654203 None
Upvotes: 0
Views: 2084
Reputation: 11639
I am struggling with the multiple child/parent relationships.
The right way to resolve this yourself is through using a debugger. For example, with VS Code, after applying a breakpoint and running the script with the debugger, it will stop at the breakpoint and I can inspect all the variables in memory, and run commands at the debug console just as if they were in my script. The Variable windows output looks like this:
There are various ways to do this at the command-line, or with a REPL like iPython, etc., but I find using debugging in a modern IDE environment like VS Code or PyCharm are definitely the way to go. Their debuggers remove the need to pepper print statements everywhere to test out your code, rewriting your code to expose more variables that must be printed to the console, etc.
A debugger allows you to see all the variables as a snapshot, exactly how the Python interpreter sees them, at any point in your code execution. You can:
find
methodInspecting the variables as I step through your code, it appears that the find() method was walking the children within an Element at all levels, not just at the top level. When you used x.find('tracking')
it is finding the mailPiece
nodes directly. If you print the tag
property, instead of the text
property, you will see it is 'mailPiece'
(see the debug windows above).
So, one way to resolve your issue is to store each mailPiece element as a variable, then pull out the individual attributes you want from it (i.e. BarCode, address, etc.) using find.
Here is some code that pulls all of this into a combined hierarchy of lists and dictionaries that you can then use to build your Excel outputs.
Note: The most efficient way to do this is line-by-line as you read the xml, but this is better for readability, maintainability, and if you need to do any post-processing that requires knowledge of more than one node at a time.
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
from types import SimpleNamespace
tree = ET.parse('mailings.xml')
root = tree.getroot()
jobs = []
for job in root[1].findall('job'):
jobdict = {}
jobdict['id'] = job.find('id').text
jobdict['trackingMailPieces'] = []
for tracking in job.find('tracking'):
if tracking.tag == 'mailPiece':
mailPieceDict = {}
mailPieceDict['barCode'] = tracking.find('barCode').text
mailPieceDict['address'] = tracking.find('address').text
mailPieceDict['status'] = tracking.find('status').text
mailPieceDict['dateTime'] = tracking.find('dateTime').text
mailPieceDict['statusLocation'] = tracking.find('statusLocation').text
jobdict['trackingMailPieces'].append(mailPieceDict)
jobs.append(jobdict)
for job in jobs:
print('Job ID: {}'.format(job['id']))
for mp in job['trackingMailPieces']:
print(' mailPiece:')
for key, value in mp.items():
print(' {} = {}'.format(key, value))
The result is:
Job ID: 5654206
mailPiece:
barCode = 00270200802095682022
address = Accounts Payable,1661 Knott Ave,La Mirada,CA,90638
status = En Route
dateTime = 2019-10-12 00:04:21.0
statusLocation = PONTIAC,MI
I didn't address what to do with the output as that is beyond the scope of this question, but consider writing out to a CSV file, or even directly to an Excel file, if you don't need to pass on the XML to another program for some reason. There are Python packages that handle writing CSV and Excel files.
No need to create an intermediate format that you then need to manipulate after bringing it into Excel, for example.
Upvotes: 1