Reputation: 305
****EDITED TO ADD ROOT ELEMENT IN THE XML (and it changes nothing)****
I'm using python 3.7
I have an xml file named 'f':
<root>
<page>
<title>Chapter 1</title>
<content>Welcome to Chapter 1</content>
</page>
<page>
<title>Chapter 2</title>
<content>Welcome to Chapter 2</content>
</page>
</root>
****ALSO EDITED TO ADD This is part of a bigger code and for reasons the content of the file 'f' is in a type:
<class 'nt.DirEntry'>
And I got this type by grabbing the file from a folder using
for folder in os.scandir(folderPath):
****
I want to extract every piece of text in that xml regardless of the tags and how they are nested. So I would have :
Chapter 1
Welcome to Chapter 1
Chapter 2
Welcome to Chapter 2
I tried:
import xml.etree.ElementTree as ET
tree = ET.parse(f)
root = tree.getroot()
root.text #returns nothing
#or
root.tostring() #returns AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'tostring'
and I tried:
tree = ET.fromstring(f)
print(''.join(tree.itertext())) #returns TypeError: a bytes-like object is required, not 'nt.DirEntry'
thank you!
Upvotes: 0
Views: 1129
Reputation: 30971
Use the following code:
tree = et.parse('input.xml')
root = tree.getroot()
for it in root.iter():
txt = it.text.strip()
if txt:
print(txt)
The reason to use strip and if is to filter out printing of elements with no text or containing only "white" characters.
Look at the other answer. It contains 2 empty lines. But my solution is free from such flaws.
Upvotes: 0
Reputation: 50947
f
is a os.DirEntry
object whose path is f.path
.itertext()
is a method on Element
objects.Demo:
import xml.etree.ElementTree as ET
tree = ET.parse(f.path)
root = tree.getroot()
print(''.join(root.itertext()))
Output:
Chapter 1
Welcome to Chapter 1
Chapter 2
Welcome to Chapter 2
Upvotes: 1