Extract all the text from xml data with python

Question

I'm new to xml data processing. I want to extract the text data in the following xml file:


    1234545667abcde

so that expected result is: ['12345','45667', 'abcde'] Currently I have tried:

tree = ET.parse('data.xml')
data = tree.getiterator()
text = [data[i].text for i in range(0, len(data))]

But the result only shows ['12345','45667'] . 'abcde' is missing. Can someone help me? Thanks in advance!

Gilles Qu&#233;not · Accepted Answer

Try doing this using xpath and lxml :

import lxml.etree as etree

string = '''

    1234545667abcde

'''

tree = etree.fromstring(string)

print(tree.xpath('//p//text()'))

The Xpath expression means: "select all p elements wich containing text recursively"

['12345', '45667', 'abcde']

Answers (2)