f.rodrigues
f.rodrigues

Reputation: 3587

Parsing XML using json raises ValueError

I'm trying to parse a XML file using xml ElementTree and json

from xml.etree import ElementTree as et
import json

def parse_file(file_name):
    tree = et.ElementTree()
    npcs = {}
    for npc in tree.parse(file_name):
        quests = []
        for quest in npc:
            quest_name = quest.attrib['name']
            stages = []
            for i, stage in enumerate(quest):
                next_stage, choice, npc_condition = None, None, None
                for key, val in stage.attrib.items():
                    val = json.loads(val)
                    if key == 'choices':
                        choice = val
                    elif key == 'next_stage':
                        next_stage = val
                    elif key == 'ncp_condition':
                        npc_condition = {stage.attrib['npc_name']: val}
                stages.append([i, next_stage, choice, npc_condition])
            quests.append( {quest_name:stages})
        npcs[npc.attrib['name']] = quests
    return npcs

The XML file:

<?xml version="1.0" encoding="utf-8"?>
<npcs>
    <npc name="NPC NAME">
        <quest0 name="Quest Name here">
            <stage0 choices='{"Option1":1, "Option1":2}'>
                <text>text1</text> 
            </stage0>
            <stage1 next_stage="[3,4]">
                <text>text2</text> 
            </stage1>
            <stage3 npc_name="other_npc_name" ncp_condition='{"some_condition":false}' next_stage="[3, 4]">
                <text>text3</text>
            </stage3>
        </quest0>
    </npc>
</npcs>

But I'm having trouble with this bit:

<stage3 npc_name="other_npc_name" ncp_condition='{"some_condition":false}' next_stage="[3, 4]">

Traceback:

Traceback (most recent call last):
  File "C:/.../test2.py", line 28, in <module>
    parse_file('quests.xml')
  File "C:/.../test2.py", line 15, in parse_file
    val = json.loads(val)
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

It raises this error in the line val = json.loads(val) when key="npc_name" and val="other_npc_name".

What's wrong with that? It didn't raise any error when name="some string", but it does when npc_name="some string".

I noticed that if I change "other_npc_name" to '"other_npc_name"' it doesn't complain, but this seem a bit hackish to me

Upvotes: 0

Views: 54

Answers (1)

Raceyman
Raceyman

Reputation: 1354

JSON is a way to store data structures - thus it can only decode said data structures.

When you try to get JSON to decode something like this:

other_npc_name

JSON can't match this to any valid data type. However, if this is wrapped in quotation marks:

"other_npc_name"

JSON recognizes this as a String (as per the JSON spec, that is how a string is defined).

And this is what is happening in your script:

import json
print json.loads("other_npc_name") #throws error
print json.loads('"other_npc_name"') #returns "other_npc_name" as a Unicode string

Thus, it may seem 'hackish' to wrap the string this way, however, this is really the only way for JSON to decode it.

One potential suggestion is that if the npc_name attribute in XML is always a string, then pull it out as a string instead of trying to decode it as a JSON object.

Upvotes: 1

Related Questions