Saqib Ali
Saqib Ali

Reputation: 12585

Why is python's json.loads/dumps roundtrip lossy?

I'm using python to serialize a python object to store it in my cache. For this serialization, I'm using json.dumps() and to unserialize it after I get it out of the cache, I'm using json.loads(). I assumed this roundtrip would work without any trouble. But as you can see below it fails.

>>> import json
>>> from collections import namedtuple
>>> x = {"hello": 1, "goodbye": 2}
>>> y = namedtuple('Struct', x.keys())(*x.values())
>>> y
Struct(goodbye=2, hello=1)

>>> json.loads(json.dumps(y))
[2, 1]           # <= I expected this to be the same value as y above!!

Why is this json.dumps/loads roundtrip lossy? What function can I use to serialize this object to that unserialization will preserve its original value? I tried to use pickle but it fails to serialize the object.

Upvotes: 1

Views: 500

Answers (1)

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140188

json tries to serialize the object according to its type. It cannot serialize any object but only the "basic" ones, such as tuple (converting to square brackets like list), dict, list ... (and of course integers, strings, floats).

When testing your object using isinstance, it succeeds on tuple because namedtuple is designed to inherit from tuple:

y = namedtuple('Struct', x.keys())(*x.values())
print(isinstance(y,tuple))

result is True.

So in the encoder.py file in json module, your data is matching the isinstance test in iterencode method (code extract below, around line 311 for my Python 3.4 version):

   if isinstance(value, (list, tuple)):
        chunks = _iterencode_list(value, _current_indent_level)

So any type inheriting from list or tuple is serialized like a list

A workaround for that is proposed here: Serializing a Python namedtuple to json

Upvotes: 3

Related Questions