Why do json.dump and json.dumps behave differently?

Question

I use a custom JSONEncoder to alter some values in the dict I want to dump.

import json

class CustomJSONEncoder(json.JSONEncoder):
    def encode(self, obj):
        #obj = dict(obj)
        obj['__key'] += 1

        return super().encode(obj)

data = dict(__key=1)

print('dumps')
print(json.dumps(data, cls=CustomJSONEncoder))

print('dumps', json.dumps(data, cls=CustomJSONEncoder))

json.dump(data, open('test.json', 'w'), cls=CustomJSONEncoder)
print('dump', json.load(open('test.json', 'r')))

Running this gives the expected result:

dumps {"__key": 2}
dump {'__key': 2}

But when I uncomment the commented-out line (which is required for what I ultimately want to do because I don't want to change the original data), dumps and dump behave differently:

dumps {"__key": 2}
dump {'__key': 1}

Why does this happen? Is there a workaround for this?

jfschaefer · Accepted Answer

I do not think the line you commented out has anything to do with it.

Instead, I think the difference is that json.dumps calls encode, while json.dump calls iterencode, which you have not overwritten. The reason is presumably that iterative encoding is not useful if you store the entire result in memory anyway.

Looking at the source code for JSONEncode, it seems that encode actually calls iterencode, so, depending on what you want to achieve, it might make more sense to overwrite that one instead:

    def iterencode(self, obj, **kwargs):
        obj['__key'] += 1
        return super().iterencode(obj, **kwargs)

Why do json.dump and json.dumps behave differently?

Answers (1)

Related Questions