Reputation: 11
probably a very simple question - but I'm a python/xml newbie and cant seem to find an answer that works for me.
I am trying to parse xml values and from an xml reponse as follows
#!/usr/bin/python3
from xml.etree import cElementTree as ET
xmlstr = """<?xml version="1.0" encoding="utf-8"?>
<biwsXML_response type="find">
<clientdata>
<message></message>
<query>jurnamn:Acme Ltd</query>
<wpquery></wpquery>
<wpfilter></wpfilter>
</clientdata>
<records total="1">
<record nr="1">
<nummer>9990874474</nummer>
<orgnr>9990874474</orgnr>
<jurnamn>Acme1 Ltd</jurnamn>
<ba_postort>Täby</ba_postort>
<abv_ugrupp></abv_ugrupp>
</record>
<record nr="2">
<nummer>9890874474</nummer>
<orgnr>9890874474</orgnr>
<jurnamn>Acme2 Ltd</jurnamn>
<ba_postort>Täby</ba_postort>
<abv_ugrupp></abv_ugrupp>
</record>
</records>
</biwsXML_response>
"""
biwsXML_response = ET.fromstring(xmlstr)
for records in list(biwsXML_response):
orgnr = records.find('orgnr').text
jurnamn = records.find('jurnamn').text
print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))
When I test I get the following error.
Traceback (most recent call last):
File "read_xml_tst2.py", line 31, in <module>
orgnr = records.find('orgnr').text
AttributeError: 'NoneType' object has no attribute 'text'
I understand that I am not finding the value 'NoneType' but I don't understand were the error is. Thanks for any help.
Upvotes: 1
Views: 86
Reputation: 338158
You can't do that
records.find('orgnr').text
...well, you can, but it's a run-time error waiting to happen. There is no guarantee that .find()
returns something useful, so you can't write code that ignores this fact.
In your case, the error occurs when the code is processing the clientdata
element.
Either
records.find('orgnr')
in a variable and check that it is not None
before you try to access it any further, or.findall()
and a loop, so that the loop body is not executed when nothing was found, orApart from that, .find()
as well as .findall()
can work with XPath. It might be better to search for specific elements right from the start.
All three suggestions combined below:
biwsXML_response = ET.fromstring(xmlstr)
def get_text(context_element, xpath, default=''):
element = context_element.find(xpath)
return element.text if element else default
for record in biwsXML_response.findall('./records/record'):
orgnr = get_text(record, './orgnr', 'N/A')
jurnamn = get_text(record, './jurnamn', 'N/A')
print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))
Upvotes: 0
Reputation: 36608
Iterating over the records will look at the <clientdata>
and <records>
elements.
You are then performing a search of the next layer for each of these elements for elements with the tag orgnr
. This tag does not exist in clientdata
at all and is 2 layers deep in records
. So it is return None
for both.
The .find
method does support XPath expressions, which is what you need to use to search deeper.
orgnr = biwsXML_response.find('.//orgnr').text
jurnamn = biwsXML_response.find('.//jurnamn').text
Now keep in mind .find
only return the first element that is finds, not all of the matching elements. For that you need to use either .findall
or .iterfind
.
Upvotes: 0
Reputation: 2568
The problem is that you have to go further down in the XML tree. To get to orgnr
you have to go first into records
and then into record
:
This should help you:
for record in biwsXML_response.find('records').findall('record'):
orgnr = record.find('orgnr').text
jurnamn = record.find('jurnamn').text
print('orgnr: %s; jurnamn: %s' % (orgnr, jurnamn))
Upvotes: 1