Reputation: 1406
I have an xml file like this
<?xml version="1.0"?>
<sample>
<text>My name is <b>Wrufesh</b>. What is yours?</text>
</sample>
I have a python code like this
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
print child.text()
I only get
'My name is' as an output.
I want to get
'My name is <b>Wrufesh</b>. What is yours?' as an output.
What can I do?
Upvotes: 2
Views: 763
Reputation: 103
An Element
object has two attributes of interest here:
text
is the first text node inside the element, that is, the text between the opening tag and the first child element.tail
is the text node immediately following the element, that is, the text between the closing tag and the next tag (opening or closing). ElementTree.tostring()
prints the tail as well.So all you need is this:
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()
for child in root:
output = child.text
for grandchild in child:
output += ET.tostring(grandchild, encoding="unicode")
print(output)
Output is:
My name is <b>Wrufesh</b>. What is yours?
Upvotes: 0
Reputation: 8021
I would suggest pre-processing the xml file to wrap elements under <text>
element in CDATA. You should be able to read the values without a problem afterwards.
<text><![CDATA[<My name is <b>Wrufesh</b>. What is yours?]]></text>
Upvotes: 0
Reputation: 87074
You can get your desired output using using ElementTree.tostringlist()
:
>>> import xml.etree.ElementTree as ET
>>> root = ET.parse('sample.xml').getroot()
>>> l = ET.tostringlist(root.find('text'))
>>> l
['<text', '>', 'My name is ', '<b', '>', 'Wrufesh', '</b>', '. What is yours?', '</text>', '\n']
>>> ''.join(l[2:-2])
'My name is <b>Wrufesh</b>. What is yours?'
I wonder though how practical this is going to be for generic use.
Upvotes: 2
Reputation: 4912
I don't think treating tag in xml as a string is right. You can access the text part of xml like this:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import xml.etree.ElementTree as ET
tree = ET.parse('sample.xml')
root = tree.getroot()
text = root[0]
for i in text.itertext():
print i
# As you can see, `<b>` and `</b>` is a pair of tags but not strings.
print text._children
Upvotes: 0