Reputation: 3299
The idea is to get the value of tag endTime
for the following xml
:
<epochs xmlns="http://www.egi.com/epochs_mff" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<epoch>
<beginTime>0</beginTime>
<endTime>3586221000</endTime>
<firstBlock>1</firstBlock>
<lastBlock>897</lastBlock>
</epoch>
<epoch>
<beginTime>3750143000</beginTime>
<endTime>5549485000</endTime>
<firstBlock>898</firstBlock>
<lastBlock>1347</lastBlock>
</epoch>
</epochs>
Yet, accessing the tag directly return an empty list:
import xml.etree.ElementTree as ET
tree = ET.parse(r'epochs.xml')
epoch_list=tree.findall("epoch")
However, looping through the tree
does return the endTime
value.
import xml.etree.ElementTree as ET
tree = ET.parse(r'epochs.xml')
for elem in tree:
for subelem in elem:
print(subelem.text)
May I know how can I retrieve directly the endTime
with the value of 300937000?
Upvotes: 1
Views: 800
Reputation: 30991
The reason your code failed is that your XML uses a default namespace (xmlns="http://...").
But your call to findall contains epoch without any namespace, so it is not likely to find anything.
To process namespaced XML, you have to:
Something like:
ns = {'ep': 'http://www.egi.com/epochs_mff'}
epoch_list = tree.findall('ep:epoch', ns)
Then the result is:
[<Element '{http://www.egi.com/epochs_mff}epoch' at 0x...>]
And to get the content your endTime element, if you don't care about any intermediate elements in the XML tree, run:
tree.findtext('.//ep:endTime', namespaces=ns)
Other choice is to pass full XML path, starting from the content of the root element, but remember about the namespace prefix at each step:
tree.findtext('ep:epoch/ep:endTime', namespaces=ns)
If you have multiple endTime elements, one of possible solutions is to process them in a loop.
This time findtext is useless as it finds only the first matching element. You should use a loop based on findall and then (within the loop) retrieve the text of the current element and make the intended use of it, e.g.:
for it in tree.findall('ep:epoch/ep:endTime', namespaces=ns):
print(it.text)
Of course, replace print with whatever you need to consume the text found.
Upvotes: 1