PracticingPython
PracticingPython

Reputation: 67

Extracting attributes from elements in xml

I have a script that extracts the text and attributes from a number of xpaths. Each entry's data is appended to a list as it is extracted (all attributes followed by the text before moving on to the next xpath) and then that list is inserted into a data frame. My problem is that not every entry has the same attributes per xpath. So, for example, all entries have the element and at least one corresponding attribute (color) (ie. , but then some cat elements may have an additional attribute(s) (i.e. ) that not all cat element have. This presents an issue when the row is inserted into the data frame as the length won't match the number of columns. The order of the attributes does remain uniform unless one is missing. I need a way to insert a blank string when an attribute is effectively skipped for not being in an element.

for next_url in next_url_list:
    response = urllib.request.urlopen(next_url)
    bytes_ = response.read()
    root = xml.etree.ElementTree.fromstring(bytes_)

    for count in range(0,len(root.findall("./xpath:entry", namespaces=namespaces))):
    
        for xpath in xpaths:
            try:
                attribs = list(root.findall(xpath,namespaces=namespaces)[count].attrib.keys())
            
                for attrib in attribs:
                        award.append(root.findall(xpath, namespaces=namespaces)[count].attrib[attrib])
                    
                    award.append(root.findall(xpath, namespaces=namespaces)[count].text)
                
            except IndexError:
                pass

Upvotes: 1

Views: 77

Answers (1)

wwii
wwii

Reputation: 23783

I need a way to insert a blank string when an attribute is effectively skipped for not being in an element.

  • for each element make a dictionary of expected attributes with an empty string for the values.
    • {'a1':'','a2':'',...}
      
  • when you extract an attribute from an element update the dictionary value
  • use the dictionary to construct the row - missing attributes will have empty strings as values.

Upvotes: 1

Related Questions