wrufesh
wrufesh

Reputation: 1406

python parsing xml with ElementTree doesn't give interested result

I have an xml file like this

<?xml version="1.0"?>
<sample>
    <text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>

I have a python code like this

import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
    print child.text()

I only get

'My name is' as an output.

I want to get

'My name is <b>Wrufesh</b>. What is yours?' as an output.

What can I do?

Upvotes: 2

Views: 763

Answers (4)

Alex Ball
Alex Ball

Reputation: 103

An Element object has two attributes of interest here:

  • text is the first text node inside the element, that is, the text between the opening tag and the first child element.
  • tail is the text node immediately following the element, that is, the text between the closing tag and the next tag (opening or closing). ElementTree.tostring() prints the tail as well.

So all you need is this:

import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()
for child in root:
    output = child.text
    for grandchild in child:
        output += ET.tostring(grandchild, encoding="unicode")
    print(output)

Output is:

My name is <b>Wrufesh</b>. What is yours?

Upvotes: 0

Jerome Anthony
Jerome Anthony

Reputation: 8021

I would suggest pre-processing the xml file to wrap elements under <text> element in CDATA. You should be able to read the values without a problem afterwards.

<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>

Upvotes: 0

mhawke
mhawke

Reputation: 87074

You can get your desired output using using ElementTree.tostringlist():

>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'

I wonder though how practical this is going to be for generic use.

Upvotes: 2

Stephen Lin
Stephen Lin

Reputation: 4912

I don't think treating tag in xml as a string is right. You can access the text part of xml like this:

#!/usr/bin/env python
# -*- coding:utf-8 -*- 

import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
    print i

# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children    

Upvotes: 0

Related Questions