Reputation: 763
I'm completely stumped why lxml .text
will give me the text for a child tag but for the root tag.
some_tag = etree.fromstring('<some_tag class="abc"><strong>Hello</strong> World</some_tag>')
some_tag.find("strong")
Out[195]: <Element strong at 0x7427d00>
some_tag.find("strong").text
Out[196]: 'Hello'
some_tag
Out[197]: <Element some_tag at 0x7bee508>
some_tag.text
some_tag.find("strong").text
returns the text between the <strong>
tag.
I expect some_tag.text
to return everything between <some_tag> ... </some_tag>
Expected:
<strong>Hello</strong> World
Instead, it returns nothing.
Upvotes: 8
Views: 10682
Reputation: 33
You have to use inbuilt lxml method to retrieve all the text between the tag.
from lxml import etree
xml='''<some_tag class="abc"><strong>Hello</strong> World</some_tag>'''
tree = etree.fromstring(xml)
print(''.join(tree.xpath('//text()')))
Upvotes: 0
Reputation: 51042
from lxml import etree
XML = '<some_tag class="abc"><strong>Hello</strong> World</some_tag>'
some_tag = etree.fromstring(XML)
for element in some_tag:
print element.tag, element.text, element.tail
Output:
strong Hello World
For information on the .text
and .tail
properties, see:
To get exactly the result that you expected, use
print etree.tostring(some_tag.find("strong"))
Output:
<strong>Hello</strong> World
Upvotes: 10
Reputation: 10923
Does this help?
comp = [ etree.tostring(e) for e in some_tag]
print ''.join(comp[0])
EDITED: Thanks @mzjin for putting me on the right track
Upvotes: 0
Reputation: 13232
You'll find the missing text here
>>> some_tag.find("strong").tail
' World'
Look at http://lxml.de/tutorial.html and search for "tail".
Upvotes: 1
Reputation: 1100
I'm not sure to understand your question but you have 2 main solutions in parsing :
DOMParser : depending the langage, it's node.getNodeValue();
SAXParser : depending the langage, but in java for example is in the fonction : characters(...)
I haven't the time to search on google but in python, I know MiniDOM (a DOM parser) : http://www.blog.pythonlibrary.org/2010/11/12/python-parsing-xml-with-minidom/
I hope my answer can help you.
Upvotes: 0