Finding specific XML attribute of child element using Python?

Question


  
    
      
        
          
            Funding
            This work was supported by the NIH

I have an XML file of scientific journal metadata and am trying to extract just the funding information for each article. I need the info contained within the p tag. While the "sec id" varies between article, the "sec-type" is always "funding".

I have been trying to do this in Python3 using Element Tree.

import xml.etree.ElementTree as ET  

tree = ET.parse(journals.xml)
root = tree.getroot()
for title in root.iter("title"):
    ET.dump(title)

Any help would be greatly appreciated!

cody · Accepted Answer

You can use findall with an XPath expression to extract the values you want. I extrapolated from your example data a little bit in order to complete the document and have two p elements:


  
    
      
        
          
            Funding
            This work was supported by the NIH
          
          
            Funding
            I'm a little teapot

The following extracts all of the text contents of p nodes under a sec node where sectype="funding":

import xml.etree.ElementTree as ET

doc = ET.parse('journals.xml')
print([p.text for p in doc.findall('.//sec[@sec-type="funding"]/p')])

Result:

['This work was supported by the NIH', "I'm a little teapot"]

Finding specific XML attribute of child element using Python?

Answers (1)

Related Questions