Askeladden
Askeladden

Reputation: 11

Parsing XML values with xml.etree

probably a very simple question - but I'm a python/xml newbie and cant seem to find an answer that works for me.

I am trying to parse xml values and from an xml reponse as follows

#!/usr/bin/python3
from xml.etree import cElementTree as ET
xmlstr = """<?xml version="1.0" encoding="utf-8"?>
<biwsXML_response type="find">
 <clientdata>
  <message></message>
  <query>jurnamn:Acme Ltd</query>
  <wpquery></wpquery>
  <wpfilter></wpfilter>
 </clientdata> 
 <records total="1">
  <record nr="1">
   <nummer>9990874474</nummer>
   <orgnr>9990874474</orgnr>
   <jurnamn>Acme1 Ltd</jurnamn>
   <ba_postort>T&#228;by</ba_postort>
   <abv_ugrupp></abv_ugrupp>
  </record>
  <record nr="2">
   <nummer>9890874474</nummer>
   <orgnr>9890874474</orgnr>
   <jurnamn>Acme2 Ltd</jurnamn>
   <ba_postort>T&#228;by</ba_postort>
   <abv_ugrupp></abv_ugrupp>
  </record>
 </records>
</biwsXML_response>
"""
biwsXML_response = ET.fromstring(xmlstr)
for records in list(biwsXML_response):
    orgnr = records.find('orgnr').text
    jurnamn = records.find('jurnamn').text
    print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))

When I test I get the following error.

Traceback (most recent call last):
  File "read_xml_tst2.py", line 31, in <module>
    orgnr = records.find('orgnr').text
AttributeError: 'NoneType' object has no attribute 'text'

I understand that I am not finding the value 'NoneType' but I don't understand were the error is. Thanks for any help.

Upvotes: 1

Views: 86

Answers (3)

Tomalak
Tomalak

Reputation: 338158

You can't do that

records.find('orgnr').text

...well, you can, but it's a run-time error waiting to happen. There is no guarantee that .find() returns something useful, so you can't write code that ignores this fact.

In your case, the error occurs when the code is processing the clientdata element.

Either

  • store records.find('orgnr') in a variable and check that it is not None before you try to access it any further, or
  • use .findall() and a loop, so that the loop body is not executed when nothing was found, or
  • create a small helper function that gets the text of an element

Apart from that, .find() as well as .findall() can work with XPath. It might be better to search for specific elements right from the start.

All three suggestions combined below:

biwsXML_response = ET.fromstring(xmlstr)

def get_text(context_element, xpath, default=''):
    element = context_element.find(xpath)
    return element.text if element else default

for record in biwsXML_response.findall('./records/record'):
    orgnr = get_text(record, './orgnr', 'N/A')
    jurnamn = get_text(record, './jurnamn', 'N/A')

    print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))

Upvotes: 0

James
James

Reputation: 36608

Iterating over the records will look at the <clientdata> and <records> elements.

You are then performing a search of the next layer for each of these elements for elements with the tag orgnr. This tag does not exist in clientdata at all and is 2 layers deep in records. So it is return None for both.

The .find method does support XPath expressions, which is what you need to use to search deeper.

orgnr = biwsXML_response.find('.//orgnr').text
jurnamn = biwsXML_response.find('.//jurnamn').text

Now keep in mind .find only return the first element that is finds, not all of the matching elements. For that you need to use either .findall or .iterfind.

Upvotes: 0

AArias
AArias

Reputation: 2568

The problem is that you have to go further down in the XML tree. To get to orgnr you have to go first into records and then into record:

This should help you:

for record in biwsXML_response.find('records').findall('record'):
    orgnr = record.find('orgnr').text
    jurnamn = record.find('jurnamn').text
    print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))

Upvotes: 1

Related Questions