Parsing XML with Python: Keeping text within attribute while deleting tag around it

Question

Input:


 The name of the third river is
Hiddekel: this is the one which flows in front of Assyria. The fourth
river is the Euphrates.

Desired Output:



 The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates.

Hi there, I would like to figure a way by which to extract text from a sub element (placeName) and put it back into the larger body of text. I have similar issues elsewhere in the XML file, such as for names of people. I would like to be able to extract names and places without getting rid of milestones. Thank you for your help!

Current code:

for p in chapter.findall('p'):
    i = 1
    for text in p.itertext():
        file.write(body.attrib["n"] + " " + chapter.attrib["n"] + ":" +  str(i) + text)
        i = i + 1

Jack Fleeting · Accepted Answer

It can be done with beautifulsoup and the unwrap() method:

from bs4 import BeautifulSoup as bs

snippet = """your html above"""

soup = bs(snippet,'lxml')
pl = soup.find_all('placename')
for p in pl:
    p.unwrap()
soup

Output:



 The name of the third river is
Hiddekel: this is the one which flows in front of Assyria. The fourth
river is the Euphrates.

Parsing XML with Python: Keeping text within attribute while deleting tag around it

Answers (1)

Related Questions