Knokkelgeddon
Knokkelgeddon

Reputation: 211

Error with xmltodict

EDIT: I can print rev['contributor'] for a while but then every try to access rev['contributor'] returns the following

 TypeError: string indices must be integers

ORIGINAL POST: I'm trying to extract data from an xml using xml to dict with the code:

import xmltodict, json

with open('Sockpuppet_articles.xml', encoding='utf-8') as xml_file:
    dic_xml = xmltodict.parse(xml_file.read(), xml_attribs=False)
    print("parsed")
    for page in dic_xml['mediawiki']['page']:
        for rev in  page['revision']:
            for user in open("Sockpuppet_names.txt", "r", encoding='utf-8'):
                user = user.strip()

                if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
                    dosomething()

I get this error in the last line with the if-statement:

TypeError: string indices must be integers

Weird thing is, it works on another xml-file.

Upvotes: 1

Views: 1085

Answers (1)

Mello
Mello

Reputation: 61

I got the same error when the next level has only one element.

...

## Read XML
pastas = [os.path.join(caminho, name) for name in os.listdir(caminho)]
pastas = filter(os.path.isdir, pastas)
for pasta in pastas:
    for arq in glob.glob(os.path.join(pasta, "*.xml")):
        xmlData = codecs.open(arq, 'r', encoding='utf8').read()
        xmlDict = xmltodict.parse(xmlData, xml_attribs=True)["XMLBIBLE"]
        bible_name = xmlDict["@biblename"]
        list_verse = []
        for xml_inBook in xmlDict["BIBLEBOOK"]:
            bnumber = xml_inBook["@bnumber"]
            bname = xml_inBook["@bname"]
            for xml_chapter in xml_inBook["CHAPTER"]:
                cnumber = xml_chapter["@cnumber"]
                for xml_verse in xml_chapter["VERS"]:
                    vnumber = xml_verse["@vnumber"]
                    vtext = xml_verse["#text"]
...


TypeError: string indices must be integers

The error occurs when the book is "Obadiah". It has only one chapter.

xml_inBook

Cliking CHAPTER value we see the following view. Then it's supposed xml_chapter will be the same. That is true only if the book has more then one chapter: enter image description here

But the loop returns "@cnumber" instead of an OrderedDict.

I solved that converting the OrderedDict to List when has only one chapter.

...

            if len(xml_inBook["CHAPTER"]) == 2:
                xml_chapter = list(xml_inBook["CHAPTER"].items())
                cnumber = xml_chapter[0][1]
                for xml_verse in xml_chapter[1][1]:
                    vnumber = xml_verse["@vnumber"]
                    vtext = xml_verse["#text"]
...

I am using Python 3,6.

Upvotes: 2

Related Questions