Reputation: 10257
I have a Python set
that contains objects with __hash__
and __eq__
methods in order to make certain no duplicates are included in the collection.
I need to json encode this result set
, but passing even an empty set
to the json.dumps
method raises a TypeError
.
File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 178, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([]) is not JSON serializable
I know I can create an extension to the json.JSONEncoder
class that has a custom default
method, but I'm not even sure where to begin in converting over the set
. Should I create a dictionary out of the set
values within the default method, and then return the encoding on that? Ideally, I'd like to make the default method able to handle all the datatypes that the original encoder chokes on (I'm using Mongo as a data source so dates seem to raise this error too)
Any hint in the right direction would be appreciated.
EDIT:
Thanks for the answer! Perhaps I should have been more precise.
I utilized (and upvoted) the answers here to get around the limitations of the set
being translated, but there are internal keys that are an issue as well.
The objects in the set
are complex objects that translate to __dict__
, but they themselves can also contain values for their properties that could be ineligible for the basic types in the json encoder.
There's a lot of different types coming into this set
, and the hash basically calculates a unique id for the entity, but in the true spirit of NoSQL there's no telling exactly what the child object contains.
One object might contain a date value for starts
, whereas another may have some other schema that includes no keys containing "non-primitive" objects.
That is why the only solution I could think of was to extend the JSONEncoder
to replace the default
method to turn on different cases - but I'm not sure how to go about this and the documentation is ambiguous. In nested objects, does the value returned from default
go by key, or is it just a generic include/discard that looks at the whole object? How does that method accommodate nested values? I've looked through previous questions and can't seem to find the best approach to case-specific encoding (which unfortunately seems like what I'm going to need to do here).
Upvotes: 231
Views: 348469
Reputation: 21
you should try jsonwhatever
https://pypi.org/project/jsonwhatever/
pip install jsonwhatever
from jsonwhatever import JsonWhatEver
set_a = {1,2,3}
jsonwe = JsonWhatEver()
string_res = jsonwe.jsonwhatever('set_string', set_a)
print(string_res)
Upvotes: 0
Reputation: 67133
You can create a custom encoder that returns a list
when it encounters a set
. Here's an example:
import json
class SetEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
return json.JSONEncoder.default(self, obj)
data_str = json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5]'
You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)}
might work.
To illustrate nested types, consider serializing this:
class Something(object):
pass
json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
This raises the following error:
TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable
This indicates that the encoder will take the list
result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:
class SetEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
if isinstance(obj, Something):
return 'CustomSomethingRepresentation'
return json.JSONEncoder.default(self, obj)
data_str = json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
print(data_str)
# Output: '[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'
Upvotes: 191
Reputation: 226624
JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.
As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s)
then you lose the ability to recover dictionaries).
A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:
from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle
class PythonObjectEncoder(JSONEncoder):
def default(self, obj):
try:
return {'_python_object': pickle.dumps(obj).decode('latin-1')}
except pickle.PickleError:
return super().default(obj)
def as_python_object(dct):
if '_python_object' in dct:
return pickle.loads(dct['_python_object'].encode('latin-1'))
return dct
Here is a sample session showing that it can handle lists, dicts, and sets:
>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
>>> j = dumps(data, cls=PythonObjectEncoder)
>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]
Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python's pickle module. These each support a much greater range of datatypes.
Upvotes: 132
Reputation: 1031
>>> import json
>>> set_object = set([1,2,3,4])
>>> json.dumps(list(set_object))
'[1, 2, 3, 4]'
Upvotes: 1
Reputation: 3104
If you know for sure that the only non-serializable data will be set
s, there's a very simple (and dirty) solution:
json.dumps({"Hello World": {1, 2}}, default=tuple)
Only non-serializable data will be treated with the function given as default
, so only the set
will be converted to a tuple
.
Upvotes: 22
Reputation: 1495
If you need just quick dump and don't want to implement custom encoder. You can use the following:
json_string = json.dumps(data, iterable_as_array=True)
This will convert all sets (and other iterables) into arrays. Just beware that those fields will stay arrays when you parse the JSON back. If you want to preserve the types, you need to write custom encoder.
Also make sure to have simplejson
installed and required.
You can find it on PyPi.
Upvotes: 8
Reputation: 1797
Shortened version of @AnttiHaapala:
json.dumps(dict_with_sets, default=lambda x: list(x) if isinstance(x, set) else x)
Upvotes: 6
Reputation: 134038
You don't need to make a custom encoder class to supply the default
method - it can be passed in as a keyword argument:
import json
def serialize_sets(obj):
if isinstance(obj, set):
return list(obj)
return obj
json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)
results in [1, 2, 3]
in all supported Python versions.
Upvotes: 42
Reputation: 921
One shortcoming of the accepted solution is that its output is very python specific. I.e. its raw json output cannot be observed by a human or loaded by another language (e.g. javascript). example:
db = {
"a": [ 44, set((4,5,6)) ],
"b": [ 55, set((4,3,2)) ]
}
j = dumps(db, cls=PythonObjectEncoder)
print(j)
Will get you:
{"a": [44, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsESwVLBmWFcQJScQMu"}], "b": [55, {"_python_object": "gANjYnVpbHRpbnMKc2V0CnEAXXEBKEsCSwNLBGWFcQJScQMu"}]}
I can propose a solution which downgrades the set to a dict containing a list on the way out, and back to a set when loaded into python using the same encoder, therefore preserving observability and language agnosticism:
from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle
class PythonObjectEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
return super().default(obj)
elif isinstance(obj, set):
return {"__set__": list(obj)}
return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}
def as_python_object(dct):
if '__set__' in dct:
return set(dct['__set__'])
elif '_python_object' in dct:
return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
return dct
db = {
"a": [ 44, set((4,5,6)) ],
"b": [ 55, set((4,3,2)) ]
}
j = dumps(db, cls=PythonObjectEncoder)
print(j)
ob = loads(j)
print(ob["a"])
Which gets you:
{"a": [44, {"__set__": [4, 5, 6]}], "b": [55, {"__set__": [2, 3, 4]}]}
[44, {'__set__': [4, 5, 6]}]
Note that serializing a dictionary which has an element with a key "__set__"
will break this mechanism. So __set__
has now become a reserved dict
key. Obviously feel free to use another, more deeply obfuscated key.
Upvotes: 1
Reputation: 1333
I adapted Raymond Hettinger's solution to python 3.
Here is what has changed:
unicode
disappeareddefault
with super()
base64
to serialize the bytes
type into str
(because it seems that bytes
in python 3 can't be converted to JSON)from decimal import Decimal
from base64 import b64encode, b64decode
from json import dumps, loads, JSONEncoder
import pickle
class PythonObjectEncoder(JSONEncoder):
def default(self, obj):
if isinstance(obj, (list, dict, str, int, float, bool, type(None))):
return super().default(obj)
return {'_python_object': b64encode(pickle.dumps(obj)).decode('utf-8')}
def as_python_object(dct):
if '_python_object' in dct:
return pickle.loads(b64decode(dct['_python_object'].encode('utf-8')))
return dct
data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
j = dumps(data, cls=PythonObjectEncoder)
print(loads(j, object_hook=as_python_object))
# prints: [1, 2, 3, {'knights', 'who', 'say', 'ni'}, {'key': 'value'}, Decimal('3.14')]
Upvotes: 11
Reputation: 3119
If you only need to encode sets, not general Python objects, and want to keep it easily human-readable, a simplified version of Raymond Hettinger's answer can be used:
import json
import collections
class JSONSetEncoder(json.JSONEncoder):
"""Use with json.dumps to allow Python sets to be encoded to JSON
Example
-------
import json
data = dict(aset=set([1,2,3]))
encoded = json.dumps(data, cls=JSONSetEncoder)
decoded = json.loads(encoded, object_hook=json_as_python_set)
assert data == decoded # Should assert successfully
Any object that is matched by isinstance(obj, collections.Set) will
be encoded, but the decoded value will always be a normal Python set.
"""
def default(self, obj):
if isinstance(obj, collections.Set):
return dict(_set_object=list(obj))
else:
return json.JSONEncoder.default(self, obj)
def json_as_python_set(dct):
"""Decode json {'_set_object': [1,2,3]} to set([1,2,3])
Example
-------
decoded = json.loads(encoded, object_hook=json_as_python_set)
Also see :class:`JSONSetEncoder`
"""
if '_set_object' in dct:
return set(dct['_set_object'])
return dct
Upvotes: 5
Reputation: 6653
Only dictionaries, Lists and primitive object types (int, string, bool) are available in JSON.
Upvotes: 6