Reputation: 31
I'm looking to isolate the following values from an XML file (https://digitallibrary.un.org/search?ln=en&p=A/RES/72/266&f=&rm=&ln=en&sf=&so=d&rg=50&c=United+Nations+Digital+Library+System&of=xm&fti=0&fti=0).
<collection>
<record>
...
<datafield tag="993" ind1="2" ind2=" ">
<subfield code="a">A/C.5/72/L.22</subfield> # Value to isolate: A/C.5/72/L.22
</datafield>
<datafield tag="993" ind1="3" ind2=" ">
<subfield code="a">A/72/682</subfield> # Value to isolate: A/72/682
</datafield>
<datafield tag="993" ind1="4" ind2=" ">
<subfield code="a">A/72/PV.76</subfield> # Value to isolate: A/72/PV.76
</datafield>
...
</record>
<record>
...
<datafield tag="993" ind1="2" ind2=" ">
<subfield code="a">A/C.5/72/L.22</subfield> # Value to isolate: A/C.5/72/L.22
</datafield>
<datafield tag="993" ind1="3" ind2=" ">
<subfield code="a">A/72/682</subfield> # Value to isolate: A/72/682
</datafield>
</record>
...
</collection>
The code I had prepared only seems to identify for each record the first item with tag 993.
for record in root:
if record.find("{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']") is not None:
symbol = record.find("{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']").text
print symbol
Is there a way to loop to search for multiple attributes using ElementTree's xpath? Thank you in advance.
Upvotes: 1
Views: 1537
Reputation: 5905
To complete user3091877's answer, alternate XPath option :
//*[name()="subfield"][@code="a"][parent::*[@tag="993"]]/text()
EDIT : This one will return 6 values (@tag=993 and @ind1=3) :
//*[name()="subfield"][parent::*[@tag="993" and @ind1="3"]]/text()
Upvotes: 1
Reputation: 13
The docs show that .find()
only gets the first matching subelement. Sounds like you want .findall()
.
The following seems to work for me:
import xml.etree.ElementTree as ET
tree = ET.parse(input_file)
root = tree.getroot()
for record in root:
xpath = "{http://www.loc.gov/MARC21/slim}datafield[@tag='993']/{http://www.loc.gov/MARC21/slim}subfield[@code='a']"
if record.findall(xpath) is not None:
for symbol in record.findall(xpath):
print symbol.text
Upvotes: 0