Trend Freelancer
Trend Freelancer

Reputation: 1

XML accessing elements within the tree with Etree python

I'm trying to access information within a XML file via python Etree. The XML looks like this:

<events-data>
      <dossier-event event-type="new" id="EVT_4573534">
         <event-date>
            <date>20220816</date>
         </event-date>
         <event-code>EPIDOSNWIAI</event-code>
         <event-text event-text-type="DESCRIPTION">text</event-text>
      </dossier-event>
   </events-data>
   <events-data>
      <dossier-event event-type="new" id="EVT_4573535">
         <event-date>
            <date>20220402</date>
         </event-date>
         <event-code>EPIDOS PCT</event-code>
         <event-text event-text-type="DESCRIPTION">text1</event-text>
      </dossier-event>
   </events-data>

I want to access the <date> 20220402 </date> and retrieve the date, so 20220402. My attempt for it looks like this:

root_events = ET.fromstring(response_events.content)
for element in root_events.iter('{http://myapi/register}date'):
    print(element.text)

The problem: There is an unknown number of<date>[date]</date> before and after this date, but which is not within <events-data> or <event-date>. But if I try to list all tags, attributes or text of <event-date>, it's empty. Can someone explain me how i only access the dates within something like

<event-date>
     <date>20220402</date>
</event-date>

Upvotes: 0

Views: 56

Answers (1)

j_b
j_b

Reputation: 2020

If you fix your XML to have a root tag, an XPATH query might work for you:

import xml.etree.ElementTree as ET

for event_date in ET.parse("sample.xml").getroot().findall(".//events-data/dossier-event/event-date/date"):
    print(event_date.text)

Produces the following output:

$ python sample.py
20220816
20220402

Upvotes: 1

Related Questions