Reputation: 83848
I am caching some JSON data, and in storage it is represented as a JSON-encode string. No work is performed on the JSON by the server before sending it to the client, other than collation of multiple cached objects, like this:
def get_cached_items():
item1 = cache.get(1)
item2 = cache.get(2)
return json.dumps(item1=item1, item2=item2, msg="123")
There may be other items included with the return value, in this case represented by msg="123"
.
The issue is that the cached items are double-escaped. It would behoove the library to allow a pass-through of the string without escaping it.
I have looked at the documentation for json.dumps default
argument, as it seems to be the place where one would address this, and searched on google/SO but found no useful results.
It would be unfortunate, from a performance perspective, if I had to decode the JSON of each cached items to send it to the browser. It would be unfortunate from a complexity perspective to not be able to use json.dumps
.
My inclination is to write a class that stores the cached string and when the default
handler encounters an instance of this class it uses the string without perform escaping. I have yet to figure out how to achieve this though, and I would be grateful for thoughts and assistance.
EDIT For clarity, here is an example of the proposed default
technique:
class RawJSON(object):
def __init__(self, str):
self.str = str
class JSONEncoderWithRaw(json.JSONEncoder):
def default(self, o):
if isinstance(o, RawJSON):
return o.str # but avoid call to `encode_basestring` (or ASCII equiv.)
return super(JSONEncoderWithRaw, self).default(o)
Here is a degenerate example of the above:
>>> class M():
str = ''
>>> m = M()
>>> m.str = json.dumps(dict(x=123))
>>> json.dumps(dict(a=m), default=lambda (o): o.str)
'{"a": "{\\"x\\": 123}"}'
The desired output would include the unescaped string m.str
, being:
'{"a": {"x": 123}}'
It would be good if the json module did not encode/escape the return of the default
parameter, or if same could be avoided. In the absence of a method via the default
parameter, one may have to achieve the objective here by overloading the encode
and iterencode
method of JSONEncoder
, which brings challenges in terms of complexity, interoperability, and performance.
Upvotes: 6
Views: 8600
Reputation: 1235
You can use the better maintained simplejson
instead of json
which provides this functionality.
import simplejson as json
from simplejson.encoder import RawJSON
print(json.dumps([1, RawJSON(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]
You get simplicity of code, plus all the C optimisations of simplejson
.
Upvotes: 2
Reputation: 414795
A quick-n-dirty way is to patch json.encoder.encode_basestring*()
functions:
import json
class RawJson(unicode):
pass
# patch json.encoder module
for name in ['encode_basestring', 'encode_basestring_ascii']:
def encode(o, _encode=getattr(json.encoder, name)):
return o if isinstance(o, RawJson) else _encode(o)
setattr(json.encoder, name, encode)
print(json.dumps([1, RawJson(u'["abc", 2]'), u'["def", 3]']))
# -> [1, ["abc", 2], "[\"def\", 3]"]
Upvotes: 6
Reputation: 1124518
If you are caching JSON strings, you need to first decode them to python structures; there is no way for json.dumps()
to distinguish between normal strings and strings that are really JSON-encoded structures:
return json.dumps({'item1': json.loads(item1), 'item2': json.loads(item2), 'msg': "123"})
Unfortunately, there is no option to include already-converted JSON data in this; the default
function is expected to return Python values. You extract data from whatever object that is passed in and return a value that can be converted to JSON, not a value that is already JSON itself.
The only other approach I can see is to insert "template" values, then use string replacement techniques to manipulate the JSON output to replace the templates with your actual cached data:
json_data = json.dumps({'item1': '==item1==', 'item2': '==item2==', 'msg': "123"})
return json_data.replace('"==item1=="', item1).replace('"==item2=="', item2)
A third option is to cache item1
and item2
in non-serialized form, as a Python structure instead of a JSON string.
Upvotes: 4