Reputation: 211
EDIT: I can print rev['contributor'] for a while but then every try to access rev['contributor'] returns the following
TypeError: string indices must be integers
ORIGINAL POST: I'm trying to extract data from an xml using xml to dict with the code:
import xmltodict, json
with open('Sockpuppet_articles.xml', encoding='utf-8') as xml_file:
dic_xml = xmltodict.parse(xml_file.read(), xml_attribs=False)
print("parsed")
for page in dic_xml['mediawiki']['page']:
for rev in page['revision']:
for user in open("Sockpuppet_names.txt", "r", encoding='utf-8'):
user = user.strip()
if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
dosomething()
I get this error in the last line with the if-statement:
TypeError: string indices must be integers
Weird thing is, it works on another xml-file.
Upvotes: 1
Views: 1085
Reputation: 61
I got the same error when the next level has only one element.
...
## Read XML
pastas = [os.path.join(caminho, name) for name in os.listdir(caminho)]
pastas = filter(os.path.isdir, pastas)
for pasta in pastas:
for arq in glob.glob(os.path.join(pasta, "*.xml")):
xmlData = codecs.open(arq, 'r', encoding='utf8').read()
xmlDict = xmltodict.parse(xmlData, xml_attribs=True)["XMLBIBLE"]
bible_name = xmlDict["@biblename"]
list_verse = []
for xml_inBook in xmlDict["BIBLEBOOK"]:
bnumber = xml_inBook["@bnumber"]
bname = xml_inBook["@bname"]
for xml_chapter in xml_inBook["CHAPTER"]:
cnumber = xml_chapter["@cnumber"]
for xml_verse in xml_chapter["VERS"]:
vnumber = xml_verse["@vnumber"]
vtext = xml_verse["#text"]
...
TypeError: string indices must be integers
The error occurs when the book is "Obadiah". It has only one chapter.
Cliking CHAPTER value we see the following view. Then it's supposed xml_chapter will be the same. That is true only if the book has more then one chapter:
But the loop returns "@cnumber" instead of an OrderedDict.
I solved that converting the OrderedDict to List when has only one chapter.
...
if len(xml_inBook["CHAPTER"]) == 2:
xml_chapter = list(xml_inBook["CHAPTER"].items())
cnumber = xml_chapter[0][1]
for xml_verse in xml_chapter[1][1]:
vnumber = xml_verse["@vnumber"]
vtext = xml_verse["#text"]
...
I am using Python 3,6.
Upvotes: 2