Reputation: 67
I have a script that extracts the text and attributes from a number of xpaths. Each entry's data is appended to a list as it is extracted (all attributes followed by the text before moving on to the next xpath) and then that list is inserted into a data frame. My problem is that not every entry has the same attributes per xpath. So, for example, all entries have the element and at least one corresponding attribute (color) (ie. , but then some cat elements may have an additional attribute(s) (i.e. ) that not all cat element have. This presents an issue when the row is inserted into the data frame as the length won't match the number of columns. The order of the attributes does remain uniform unless one is missing. I need a way to insert a blank string when an attribute is effectively skipped for not being in an element.
for next_url in next_url_list:
response = urllib.request.urlopen(next_url)
bytes_ = response.read()
root = xml.etree.ElementTree.fromstring(bytes_)
for count in range(0,len(root.findall("./xpath:entry", namespaces=namespaces))):
for xpath in xpaths:
try:
attribs = list(root.findall(xpath,namespaces=namespaces)[count].attrib.keys())
for attrib in attribs:
award.append(root.findall(xpath, namespaces=namespaces)[count].attrib[attrib])
award.append(root.findall(xpath, namespaces=namespaces)[count].text)
except IndexError:
pass
Upvotes: 1
Views: 77
Reputation: 23783
I need a way to insert a blank string when an attribute is effectively skipped for not being in an element.
{'a1':'','a2':'',...}
Upvotes: 1