Reputation: 1204
I have a nested python dictionary that is serialized into a json string, that I am further converting to a compressed Gzip file and base64 encoding it. However, once I convert it back to the JSON string, it adds \\
to the string, which isn't in the original JSON string before conversion. This happens at each of the nested dictionary levels. These are the functions:
import json
import io
import gzip
import base64
import zlib
class numpy_encoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return super(numpy_encoder, self).default(obj)
def dict_json_dump(dictionary):
dumped = json.dumps(dictionary, cls = numpy_encoder, separators=(",", ":"))
return dumped
def gzip_json_encoder(json_string):
stream = io.BytesIO()
with gzip.open(filename=stream, mode='wt') as zipfile:
json.dump(json_string, zipfile)
return stream
def base64_encoder(gzip_string):
return base64.b64encode(gzip_string.getvalue())
We can use the functions as follows:
json_dict = pe.dict_json_dump(test_dictionary)
gzip_json = pe.gzip_json_encoder(json_dict)
base64_gzip = pe.base64_encoder(gzip_json)
When I check the base64_gzip
with the following function:
json_str = zlib.decompress(base64.b64decode(base64_gzip), 16 + zlib.MAX_WBITS)
I get the JSON string back in a format like this(truncated):
b'"{\\"trainingResults\\":{\\"confusionMatrix\\":{\\"tn\\":2,\\"fn\\":1,\\"tp\\":1,\\"fp\\":1},\\"auc\\":{\\"score\\":0.5,\\"tpr\\":[0.0,0.5,0.5,1.0],\\"fpr\\":[0.0,0.333,0.667,1.0]},\\"f1\\"
This isn't the full string, but the contents of the string itself is accurate. What I'm not sure about is why the back slashes are showing up when I convert it back. Anyone have any suggestions? I tried utf-8 encoding on my JSON as well, with no luck. Any help is appreciated!
Upvotes: 2
Views: 1264
Reputation: 782407
You're doing JSON encoding twice: Once in dict_json_dump()
and again in gzip_json_encoder()
. Since json_string
is already encoded, you don't need to call json.dump()
in gzip_json_encoder()
.
def gzip_json_encoder(json_string):
stream = io.BytesIO()
with gzip.open(filename=stream, mode='wt') as zipfile:
zipfile.write(json_string)
return stream
Upvotes: 3