Reputation: 1
I'm trying to access information within a XML file via python Etree. The XML looks like this:
<events-data>
<dossier-event event-type="new" id="EVT_4573534">
<event-date>
<date>20220816</date>
</event-date>
<event-code>EPIDOSNWIAI</event-code>
<event-text event-text-type="DESCRIPTION">text</event-text>
</dossier-event>
</events-data>
<events-data>
<dossier-event event-type="new" id="EVT_4573535">
<event-date>
<date>20220402</date>
</event-date>
<event-code>EPIDOS PCT</event-code>
<event-text event-text-type="DESCRIPTION">text1</event-text>
</dossier-event>
</events-data>
I want to access the <date> 20220402 </date>
and retrieve the date, so 20220402. My attempt for it looks like this:
root_events = ET.fromstring(response_events.content)
for element in root_events.iter('{http://myapi/register}date'):
print(element.text)
The problem: There is an unknown number of<date>[date]</date>
before and after this date, but which is not within <events-data>
or <event-date>
. But if I try to list all tags, attributes or text of <event-date>
, it's empty. Can someone explain me how i only access the dates within something like
<event-date>
<date>20220402</date>
</event-date>
Upvotes: 0
Views: 56
Reputation: 2020
If you fix your XML to have a root tag, an XPATH query might work for you:
import xml.etree.ElementTree as ET
for event_date in ET.parse("sample.xml").getroot().findall(".//events-data/dossier-event/event-date/date"):
print(event_date.text)
Produces the following output:
$ python sample.py
20220816
20220402
Upvotes: 1