Reputation: 26462

Python XML parsing with ElementTree returns None

I'm trying to parse this xml string using ElementTree in Python,

the data stored as a string,

xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>'''

The code I used to parse this string as xml,

from xml.etree import ElementTree

xml = ElementTree.fromstring(xml)

results = xml.findall('Student')

for students in results:
    for student in students:
        print student.get('Name')

print results prints out the results as Elements,

[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>]

inside the for loop, print students prints out the same,

<Element 'Student' at 0x7fd722d88ad0>
<Element 'Student' at 0x7fd722d88c50>
<Element 'Student' at 0x7fd722d88e10>

Anyway when I try to get the Name of the student using the print student.get('Name'), the program returns None.

What I'm trying to do is to pull the values from the xml for each tags and construct a dict.

Upvotes: 4

Answers (2)

MattH

Reputation: 38265

If you're new to XML processing:

lxml is fast and powerful library for interacting with XML in python. The standard library doesn't have full xpath support.
xpath is a query language for examining XML documents, it has a steep learning curve, but it's easy to get help with on StackOverflow. xpath is so useful that I've started casting JSON to XML when using APIs just so that I can write xpath queries instead of crazy nested dictionary dereferencing.

from lxml import etree
from pprint import pprint

doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>[email protected]</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>''')

def first(seq,default=None):
  for item in seq:
    return item
  return default

def simple_children_to_dict(element):
  result = {}
  for child in element:
    result[child.tag] = child.text
  return result

def get_by_rollnumber(number,search_results):
  student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
  if student_element is None:
    raise Exception("Student Number {0} not found".format(number))
  return simple_children_to_dict(student_element)  

def get_all_students(search_results):
  students = []
  for student_element in doc.xpath('Student'):
    students.append(simple_children_to_dict(student_element))
  return students

Then:

>>> pprint(get_by_rollnumber(2,doc))
{'Email': '[email protected]',
 'Grade': '7',
 'Name': 'Joseph',
 'PhoneNumber': 'Not Included',
 'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': '[email protected]',
  'Grade': '7',
  'Name': 'Abel',
  'PhoneNumber': 'Not Included',
  'RollNumber': '1'},
 {'Email': '[email protected]',
  'Grade': '7',
  'Name': 'Joseph',
  'PhoneNumber': 'Not Included',
  'RollNumber': '2'},
 {'Email': '[email protected]',
  'Grade': '7',
  'Name': 'Mike',
  'PhoneNumber': 'Not Included',
  'RollNumber': '3'}]

Subtleties:

xpath queries usually returns a result set because most queries could have more than one match. Hence the use of a helper first function.

Upvotes: 1

Martijn Pieters

Reputation: 1124378

You have a double loop here:

for students in results:
    for student in students:
        print student.get('Name')

students is one <Student> element. By iterating over that you get individual elements contained in that element. Those contained elements (<RollNumber>, <Name>, etc) have no Name attribute.

The .get() method only access attributes, but you appear to want to get the <Name> element. Use .find() or an XPath expression here instead:

for student in results:
    name = student.find('Name')
    if name is not None:
        print name.text

for student_name in xml.findall('.//Student/Name'):
    print name.text

Upvotes: 6

Python XML parsing with ElementTree returns None

Answers (2)

Related Questions