Nilani Algiriyage
Nilani Algiriyage

Reputation: 35776

Python 3x : Parse an xml file with namespaces using python xml.etree

I am trying to parse a large xml file using xml.etree. It has the following structure.

enter image description here

I am particularly interested in extracting References with Title an Publisher as shown in the following image.

enter image description here

The following is the code sample that I tried. It doesn't print anything. Any help is appreciated.

import xml.etree.ElementTree as et

data = """<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" exist:hits="1" exist:start="1" exist:count="1" exist:compilation-time="0" exist:execution-time="0">
    <events>
        <paging page="9" pageNumberOfRecords="20" totalNumberOfRecords="215"/>
        <WeatherEvent xmlns="http://hwe.niwa.co.nz/schema/2011" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hwe.niwa.co.nz/schema/2011 ../hwe.xsd">
    <Identifier>November_2019_Timaru_Hail</Identifier>
    <Title> November 2019 Timaru Hail</Title>
    <StartDate>2019-11-20</StartDate> 
        <Abstract>A severe hailstorm over Timaru, with golf ball-sized hail stones, caused extensive damage to buildings and vehicles.</Abstract>
        <Notes/>
        <Regions>
            <Region name="Canterbury">
                <Hazards>
                    <Hazard type="Hail">
                        <Location name="Timaru">
                            <gml:Point gml:id="Timaru_1" srsName="urn:ogc:def:crs:EPSG:6.6:4326" srsDimension="2">
                                <gml:pos>-44.398445 171.255200</gml:pos>
                            </gml:Point>
                        </Location>
                        <Impacts>
                            <Impact type="InsuranceClaim" unit="$" value="130700000">Insurance claims totalled $130.7 million.</Impact>
                            
                            <Impact type="GeneralComment">Large hail stones smashed windows, pelted holes in roofs, damaged vehicles and forced the closure of businesses.</Impact>
                            <Impact type="GeneralComment">The Fire and Emergency NZ Mid-South Canterbury area commander said they had received 30 call-outs between noon and 2.40pm. Twenty one  of them were for hail or rain damage.</Impact>
                            <Impact type="GeneralComment">The South Canterbury Chamber of Commerce said there had been considerable damage and flooding with a number of businesses forced to close until their premises were secure and safe to open.  The Timaru library and the Aigantighe Art Gallery were both closed due to damage sustained.</Impact>
                            <Impact type="GeneralComment">A Timaru panel beating business estimated there were at least 10,000 vehicles in Timaru that were damaged by the hail.  Vehicles had dents, broken windscreens and broken wing mirrors.  The structural integrity of many of the damaged vehicles was found to be compromised.</Impact>
                            <Impact type="GeneralComment">An Australian-based team of hail damage repairers set up a base in Timaru to fix cars damaged in the hailstorm.  They anticipated that repairing hail-damaged cars in Timaru would take at least six months.</Impact>
                        </Impacts>
                    </Hazard>
                    <Hazard type="Hail">
                        <Location name="St Andrews">
                            <gml:Point gml:id="St_Andrews_1" srsName="urn:ogc:def:crs:EPSG:6.6:4326" srsDimension="2">
                                <gml:pos>-44.5301 171.1909</gml:pos>
                            </gml:Point>
                        </Location>
                        <Impacts>
                            <Impact type="GeneralComment">Federated Farmers reported there had been significant crop damage near St Andrews.</Impact>
                        </Impacts>
                    </Hazard>
                </Hazards>
            </Region>
        </Regions>
        <References>
            <Reference>
                <Title>Insurance Council of New Zealand (https://www.icnz.org.nz/natural-disasters/cost-of-natural-disasters/)</Title>
                <Type>Reference</Type>
            </Reference>            
            <Reference>
                <Title>Headline:  Giant hail stones hammer Timaru as storm moves up the country.</Title>
                <Type>Reference</Type>
                <Publisher>www.stuff.co.nz, 20 November 2019.  </Publisher>
            </Reference>
            <Reference>
                <Title>Headline:  Insurance companies face deluge of hail damage claims.</Title>
                <Type>Reference</Type>
                <Publisher>www.stuff.co.nz, 21 November 2019.  </Publisher>
            </Reference>
            <Reference>
                <Title>Headline:  Cars damaged in severe Timaru hailstorm failing warrents of fitness.</Title>
                <Type>Reference</Type>
                <Publisher>www.stuff.co.nz, 4 December 2019.  </Publisher>
            </Reference>
            <Reference>
                <Title>Headline:  Record insurance repairs for cars smashed by hail in Timaru.</Title>
                <Type>Reference</Type>
                <Publisher>www.stuff.co.nz, 23 December 2019.  </Publisher>
            </Reference>
            
        </References>
</WeatherEvent>
</events>
</exist:result>
"""

root = et.fromstring(data)

ns = {'exist':'http://exist.sourceforge.net/NS/exist', 'niwa':'http://hwe.niwa.co.nz/schema/2011'}


results = root.findall('exist:result', ns)
for event in results:
    weatherEvnt = event.find('niwa: events', ns)
    for WE in weatherEvnt:
        Ref = WE.find('niwa: WeatherEvent', ns)
        for x in Ref.find('niwa: References', ns):
            print(x.text)

Upvotes: 1

Views: 122

Answers (1)

kjhughes
kjhughes

Reputation: 111756

Problems include at least:

  1. The root is already exist:result, so the initial

    results = root.findall('exist:result', ns)
    

    returns an empty list because exist:result has no such children.

  2. There should be no space after the colon following a namespace prefix and its local-name. E.g. niwa: events should be niwa:events et. al.

  3. There are no text children of niwa:References.

Not sure exactly what your end goal is, but this code,

import xml.etree.ElementTree as et

data = "" # As specified in question.

root = et.fromstring(data)
ns = {'exist':'http://exist.sourceforge.net/NS/exist',
      'niwa':'http://hwe.niwa.co.nz/schema/2011'}

for ref in root.findall('.//niwa:Title', ns):
  print('Title='+ref.text)

will demonstrate successful selection of text in namespaced XML, and output:

Title= November 2019 Timaru Hail
Title=Insurance Council of New Zealand (https://www.icnz.org.nz/natural-disasters/cost-of-natural-disasters/)
Title=Headline:  Giant hail stones hammer Timaru as storm moves up the country.
Title=Headline:  Insurance companies face deluge of hail damage claims.
Title=Headline:  Cars damaged in severe Timaru hailstorm failing warrents of fitness.
Title=Headline:  Record insurance repairs for cars smashed by hail in Timaru.

Upvotes: 1

Related Questions