Extract xml text when elements in between text

Question

I have this xml file:



    
        Some text here that
        continues
        but then has some more stuff.

and I need to parse it to extract its text. I am using xml.etree.ElementTree for this (see documentation).

This is the simple code I use to parse and explore the file:

import xml.etree.ElementTree as ET
tree = ET.parse(file_path)
root = tree.getroot()

def explore_element(element):
    print(element.tag)
    print(element.attrib)
    print(element.text)
    for child in element:
        explore_element(child)

explore_element(root)

Things work as expected, except that element

does not have the complete text. In particular, I seem to be missing "but then has some more stuff" (the text in

that comes after the element).

The xml file is a given, so I cannot improve it, even if there is a better recommended way to write it (and there are too many to try to fix manually).

Is there a way I can get all the text?

The output that my code produces (in case it helps) is this:

do
{'title': 'Example document', 'date': 'today'}

db
{'descr': 'First level'}

P 
{}
        Some text here that

af
{'d': 'reference 1'}
continues

EDIT:

The accepted answer made me realize I had not read the documentation as closely as I should. People with related problems may also find .tail useful.

DirtyBit · Accepted Answer

Using BeautifulSoup:

list_test.xml:



    
        Some text here that
        continues
        but then has some more stuff.

and then:

from bs4 import BeautifulSoup

with open('list_test.xml','r') as f:
    soup = BeautifulSoup(f.read(), "html.parser")
    for line in soup.find_all('p'):
         print(line.text)

OUTPUT:

Some text here that
continues
but then has some more stuff.

EDIT:

Using elementree:

import xml.etree.ElementTree as ET
xml = ' Some text here that continues but then has some more stuff.'
tree = ET.fromstring(xml)
print(''.join(tree.itertext()))

OUTPUT:

Some text here that continues but then has some more stuff.

Extract xml text when elements in between text

Answers (1)

Related Questions