Brian M. Hunt
Brian M. Hunt

Reputation: 83848

In Python, have json not escape a string

I am caching some JSON data, and in storage it is represented as a JSON-encode string. No work is performed on the JSON by the server before sending it to the client, other than collation of multiple cached objects, like this:

def get_cached_items():
  item1 = cache.get(1)
  item2 = cache.get(2)
  return json.dumps(item1=item1, item2=item2, msg="123")

There may be other items included with the return value, in this case represented by msg="123".

The issue is that the cached items are double-escaped. It would behoove the library to allow a pass-through of the string without escaping it.

I have looked at the documentation for json.dumps default argument, as it seems to be the place where one would address this, and searched on google/SO but found no useful results.

It would be unfortunate, from a performance perspective, if I had to decode the JSON of each cached items to send it to the browser. It would be unfortunate from a complexity perspective to not be able to use json.dumps.

My inclination is to write a class that stores the cached string and when the default handler encounters an instance of this class it uses the string without perform escaping. I have yet to figure out how to achieve this though, and I would be grateful for thoughts and assistance.

EDIT For clarity, here is an example of the proposed default technique:

class RawJSON(object):
   def __init__(self, str):
       self.str = str

class JSONEncoderWithRaw(json.JSONEncoder):
   def default(self, o):
       if isinstance(o, RawJSON): 
          return o.str # but avoid call to `encode_basestring` (or ASCII equiv.)
       return super(JSONEncoderWithRaw, self).default(o)

Here is a degenerate example of the above:

>>> class M():
       str = ''
>>> m = M()
>>> m.str = json.dumps(dict(x=123))
>>> json.dumps(dict(a=m), default=lambda (o): o.str)
'{"a": "{\\"x\\": 123}"}'

The desired output would include the unescaped string m.str, being:

'{"a": {"x": 123}}'

It would be good if the json module did not encode/escape the return of the default parameter, or if same could be avoided. In the absence of a method via the default parameter, one may have to achieve the objective here by overloading the encode and iterencode method of JSONEncoder, which brings challenges in terms of complexity, interoperability, and performance.

Upvotes: 6

Views: 8600

Answers (3)

Emile
Emile

Reputation: 1235

You can use the better maintained simplejson instead of json which provides this functionality.

import simplejson as json
from simplejson.encoder import RawJSON

print(json.dumps([1, RawJSON(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]

You get simplicity of code, plus all the C optimisations of simplejson.

Upvotes: 2

jfs
jfs

Reputation: 414795

A quick-n-dirty way is to patch json.encoder.encode_basestring*() functions:

import json

class RawJson(unicode):
    pass

# patch json.encoder module
for name in ['encode_basestring', 'encode_basestring_ascii']:
    def encode(o, _encode=getattr(json.encoder, name)):
        return o if isinstance(o, RawJson) else _encode(o)
    setattr(json.encoder, name, encode)


print(json.dumps([1, RawJson(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]

Upvotes: 6

Martijn Pieters
Martijn Pieters

Reputation: 1124518

If you are caching JSON strings, you need to first decode them to python structures; there is no way for json.dumps() to distinguish between normal strings and strings that are really JSON-encoded structures:

return json.dumps({'item1': json.loads(item1), 'item2': json.loads(item2), 'msg': "123"})

Unfortunately, there is no option to include already-converted JSON data in this; the default function is expected to return Python values. You extract data from whatever object that is passed in and return a value that can be converted to JSON, not a value that is already JSON itself.

The only other approach I can see is to insert "template" values, then use string replacement techniques to manipulate the JSON output to replace the templates with your actual cached data:

json_data = json.dumps({'item1': '==item1==', 'item2': '==item2==', 'msg': "123"})
return json_data.replace('"==item1=="', item1).replace('"==item2=="', item2)

A third option is to cache item1 and item2 in non-serialized form, as a Python structure instead of a JSON string.

Upvotes: 4

Related Questions