How to apply xmlTree iterparse to nested XML set

Question

I am trying to replicate the example from this tutorial, but using iterparse with elem.clear().

XML example:



    
        
            
                
                    
                        
                            
                                3.98749e-05
                                
                                
                                    60
                                    3.5
                                
                            
                            
                                0.000285263
                                
                                
                                    60
                                    3.5

The output is expected like this: I am trying to parse it using the following code:

import os
import xml.etree.cElementTree as etree
import codecs
import csv

PATH = 'D:\Book1'
FILENAME_BIO = 'Test.csv'
FILENAME_XML = 'all_aglu_emissions.xml'
ENCODING = "utf-8"


pathBIO = os.path.join(PATH, FILENAME_BIO)
pathXML = os.path.join(PATH, FILENAME_XML)

with codecs.open(pathBIO, "w", ENCODING) as bioFH:
    bioWriter = csv.writer(bioFH, quoting=csv.QUOTE_MINIMAL)
    bioWriter.writerow(['Year','Gas', 'Value','Technology','Crop','Country'])

    for event, elem in etree.iterparse(pathXML, events=('start','end')):
        if event == 'start' and elem.tag == 'region':
            str1 = elem.attrib['name']
        elif event == 'start' and elem.tag == 'AgSupplySector':
            str2 = elem.attrib['name']
        elif event == 'start' and elem.tag == 'AgProductionTechnology':
            str3 = elem.attrib['name']
        elif event == 'start' and elem.tag == 'period':
            str4 = elem.attrib['year']
        elif event == 'start' and elem.tag == 'Non-CO2':
            str5 = elem.attrib['name']
        elif event == 'end' and elem.tag == 'input-emissions':
            for em in elem.iter('input-emissions'):
                str6 = em.text
                bioWriter.writerow([str4, str5, str6, str3, str2, str1])
            
            elem.clear()

My issue(s) here is that I got more extra lines with empty fields for str6. Probably, I have nesting problem here. Please help. Error example (0 fields appear):

Tomalak · Accepted Answer

The for em in elem.iter('input-emissions') loop is useless, drop it.

import os
import xml.etree.ElementTree as etree
import csv

PATH = '.'
FILENAME_BIO = 'Test.csv'
FILENAME_XML = 'all_aglu_emissions.xml'


pathBIO = os.path.join(PATH, FILENAME_BIO)
pathXML = os.path.join(PATH, FILENAME_XML)

with open(pathBIO, 'w', encoding='utf8', newline='') as bioFH:
    bioWriter = csv.writer(bioFH, quoting=csv.QUOTE_MINIMAL)
    bioWriter.writerow('Year Gas Value Technology Crop Country'.split())

    for event, elem in etree.iterparse(pathXML, events=('start',)):
        if elem.tag == 'region':
            str1 = elem.attrib['name']
        elif elem.tag == 'AgSupplySector':
            str2 = elem.attrib['name']
        elif elem.tag == 'AgProductionTechnology':
            str3 = elem.attrib['name']
        elif elem.tag == 'period':
            str4 = elem.attrib['year']
        elif elem.tag == 'Non-CO2':
            str5 = elem.attrib['name']
        elif elem.tag == 'input-emissions':
            str6 = elem.text
            bioWriter.writerow([str4, str5, str6, str3, str2, str1])
        elem.clear()

There are some other subtle changes I made to the code, since I assume you're using Python 3 for this. They include using xml.etree.ElementTree instead of the obsolete xml.etree.cElementTree, skipping the codecs module (Python 3 can do that natively) and passing the newline='' parameter to the open() call, so the csv module can handle newlines correctly by itself.

Since listening to the start event is enough for the desired effect, I've dropped handling the end event entirely.

The result is

Year,Gas,Value,Technology,Crop,Country
1975,SO2_1_AWB,3.98749e-05,Corn_NelsonR,Corn,USA
1975,NOx_AWB,0.000285263,Corn_NelsonR,Corn,USA

How to apply xmlTree iterparse to nested XML set

Answers (1)

Related Questions