All Іѕ Vаиітy
All Іѕ Vаиітy

Reputation: 26442

Python breaks parsing json with characters \"

I'm trying to parse json string with an escape character (Of some sort I guess)

{
    "publisher": "\"O'Reilly Media, Inc.\""
}

Parser parses well if I remove the character \" from the string,

the exceptions raised by different parsers are,

json

  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 17 column 20 (char 392)

ujson

ValueError: Unexpected character in found when decoding object value

How do I make the parser to escape this characters ?

update: enter image description here ps. json is imported as ujson in this example

enter image description here

This is what my ide shows

comma is just added accidently, it has no trailing comma at the end of json, json is valid

enter image description here

the string definition.

Upvotes: 2

Views: 6653

Answers (2)

Shawn Mehan
Shawn Mehan

Reputation: 4568

Your JSON is invalid. If you have questions about your JSON objects, you can always validate them with JSONlint. In your case you have an object

{
"publisher": "\"O'Reilly Media, Inc.\"",
}

and you have an extra comma indicating that something else should be coming. So JSONlint yields

Parse error on line 2: ...edia, Inc.\"", } ---------------------^ Expecting 'STRING'

which would begin to help you find where the error was.

Removing the comma for

{
"publisher": "\"O'Reilly Media, Inc.\""
}

yields

Valid JSON

Update: I'm keeping the stuff in about JSONlint as it may be helpful to others in the future. As for your well formed JSON object, I have

import json

d = {
    "publisher": "\"O'Reilly Media, Inc.\""
    }

print "Here is your string parsed."
print(json.dumps(d))

yielding

Here is your string parsed. {"publisher": "\"O'Reilly Media, Inc.\""}

Process finished with exit code 0

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1122492

You almost certainly did not define properly escaped backslashes. If you define the string properly the JSON parses just fine:

>>> import json
>>> json_str = r'''
... {
...     "publisher": "\"O'Reilly Media, Inc.\""
... }
... '''  # raw string to prevent the \" from being interpreted by Python
>>> json.loads(json_str)
{u'publisher': u'"O\'Reilly Media, Inc."'}

Note that I used a raw string literal to define the string in Python; if I did not, the \" would be interpreted by Python and a regular " would be inserted. You'd have to double the backslash otherwise:

>>> print '\"'
"
>>> print '\\"'
\"
>>> print r'\"'
\"

Reencoding the parsed Python structure back to JSON shows the backslashes re-appearing, with the repr() output for the string using the same double backslash:

>>> json.dumps(json.loads(json_str))
'{"publisher": "\\"O\'Reilly Media, Inc.\\""}'
>>> print json.dumps(json.loads(json_str))
{"publisher": "\"O'Reilly Media, Inc.\""}

If you did not escape the \ escape you'll end up with unescaped quotes:

>>> json_str_improper = '''
... {
...     "publisher": "\"O'Reilly Media, Inc.\""
... }
... '''
>>> print json_str_improper

{
    "publisher": ""O'Reilly Media, Inc.""
}

>>> json.loads(json_str_improper)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 3 column 20 (char 22)

Note that the \" sequences now are printed as ", the backslash is gone!

Upvotes: 9

Related Questions