dim_yf_95
dim_yf_95

Reputation: 59

Finding Tags within an XML file with Python

I need some help in my python code for handling an XML file. I want to get subtags and store them in lists and do some stuff with them. Until now my code was working because I was thinking that the XML structure is the same for every file i had. so I used ElementTree library for parsing etc, then .findall(tagname) and after that I did some stuff with the lists. But then I realized that some files have more tags and because of that I don't get everything i need. To give you an idea,

<parent tag (same for every file)>
  <tag1>
    .....
  </tag1>
  <tag2>
    .....
  </tag2>
  <tag3>
    .....
  </tag3>
  <unknown tag1>
    .....
  </unknown tag1>
  <unknown tag2>
    .....
  </unknown tag2>
  <tag2>
    .....
  </tag2>
  <tag2>
    .....
  </tag2>
  <unknown tag1>
    .....
  </unknown tag1>
</parent tag>

So my current code is:

list1 = root.findall('tag1')
list2 = root.findall('tag2')
list3 = root.findall('tag3')

and then I do something for what is inside those tags which is working. I need help on how to detect every tag under parent tag, and then store them in a list so i can do the findall() funtion for each tag in the list. Something like

List_of_tags = [tag1, tag2, tag3, unknown tag1, etc]

for tag in list_of_tags:

....

Thank you in advance!

I actually parse xml files with ElemntTree like that:

try:
    tree = ET.parse(filename)
except IOError as e:
    print 'No such file or directory'
else:
    root = tree.getroot()

Upvotes: 0

Views: 1904

Answers (2)

dim_yf_95
dim_yf_95

Reputation: 59

----- SOLUTION -----

child_tags = root.getchildren()
for child in child_tags:
    k = child.tag
    tags.append(k)

for tag in tags:
    list1 = root.findall(tag)
    tagslist = tagslist + list1

#remove duplicates
tagslist = list(dict.fromkeys(tagslist))

Upvotes: 0

Zeeshan
Zeeshan

Reputation: 1166

You can use xmltodict

pip install xmltodict

And here's how you can get all the child tags under a parent tag

import xmltodict
my_xml = """<parent_tag>
  <tag1>
    .....
  </tag1>
  <tag2>
    .....
  </tag2>
  <tag3>
    .....
  </tag3>
  <unknown_tag1>
    .....
  </unknown_tag1>
  <unknown_tag2>
    .....
  </unknown_tag2>
  <tag2>
    .....
  </tag2>
  <tag2>
    .....
  </tag2>
  <unknown_tag1>
    .....
  </unknown_tag1>
</parent_tag>"""

xmld = xmltodict.parse(my_xml)

child_tags = xmld['parent_tag'].keys()

for child_tag in child_tags:
    print(child_tag)

The output will look like this:

tag1
tag2
tag3
unknown_tag1
unknown_tag2

Upvotes: 1

Related Questions