Get all the children from xml tag using ElementTree

Question

I am trying to parse XML file using ElementTree and at some point I am getting only first child instead of all the children inside the tag- Following is my XML structure:-

    
    
        
            I charge it at night and skip taking the cord with me because of the good battery life.
            
                
                
            
        
        
            I bought a HP Pavilion DV4-1222nr laptop and have had so many problems with the computer.
        
        
            The tech guy then said the service center does not do 1-to-1 exchange and I have to direct my concern to the "sales" team, which is the retail shop which I bought my netbook from.

I want to get 'term' in every 'aspectTerm' tag. Following is my code for that:-

    import xml.etree.ElementTree as ET
    tree = ET.parse('Laptops_Train.xml')
    root = tree.getroot()
    df = pd.DataFrame()

    def getAspect(sentences):
        reviewList = []
        text = sentence.find('text').text
        reviewList.append(text)
        for aspectTerms in sentence.iter('aspectTerms'):
            #for aspectTerm in aspectTerms.iter('aspectTerm'): 
            aspect = aspectTerms.find('aspectTerm').get('term')
            print(aspect)
            return aspect

    aspectList = []
    for sentences in root.iter('sentences'):
        for sentence in sentences.iter('sentence'):
            aspectList.append(getAspect(sentence))

Actual Results:

cord
class 'NoneType'
service center

Expected Result:

[cord, battery life]
[]
[service center,"sales" team, tech guy]

Thanks in advance

Bill Bell · Accepted Answer

This is much easier to do using the lxml library, which has xpath.

>>> from lxml import etree
>>> tree = etree.parse('Laptops_Train.xml')
>>> for aspectTerms in tree.xpath('.//aspectTerms'):
...     aspectTerms.xpath('aspectTerm/@term')
... 
['cord', 'battery life']
['service center', '"sales" team', 'tech guy']

Notice too that all aspectTerms have a Term property; there are no empty ones that would give rise to None.

Edit, inspired by comment.

>>> from lxml import etree
>>> tree = etree.parse('Laptops_Train.xml')
>>> for sentence in tree.xpath('.//sentence'):
...     sentence.xpath('.//aspectTerm/@term')
... 
['cord', 'battery life']
[]
['service center', '"sales" team', 'tech guy']

Get all the children from xml tag using ElementTree

Answers (2)

Related Questions