xaratustra
xaratustra

Reputation: 679

Getting elementtree tag texts by partial tag names

In an XML document, I have an element with a DateTime tag, which can be extracted using:

for elem in xml_tree_root.iter(tag='DateTime'):
    print(elem.text)

in another version of the same XML file, the tag's name is blahblooDateTimebloobli. So I need something like:

for elem in xml_tree_root.iter(tag='*DateTime*'):
    print(elem.text)

that could work for both versions of the XML. But with the latter it doesn't work. It matches everything though, if I only put '*' which means in principle it must somehow work. My question is whether it is possible to feed regexp to elementtree iter search?

Upvotes: 2

Views: 723

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627517

It looks as if you simply want to get the text of any tags that contain the DateTime substring.

In this case, you can use

values = [e.text for e in xml_tree_root.iter('*') if 'DateTime' in e.tag]
print(values)

That is, iterate over all the tags and if the tag name contains DateTime, get the node text value.

Upvotes: 2

trincot
trincot

Reputation: 351369

The documentation of element.iter is clear:

iter(tag=None)

[...] If tag is not None or '*', only elements whose tag equals tag are returned from the iterator.

So there is no support for wildcards, except for a complete joker: '*'.

If you know the two variants, then just chain two iterators:

from itertools import chain

for elem in chain(xml_tree_root.iter(tag='DateTime'), xml_tree_root.iter(tag='blahblooDateTimebloobli')):
    print(elem.text)

Upvotes: 1

Related Questions