bbengfort
bbengfort

Reputation: 5392

How do I keep the JSON key order fixed with Python 3 json.dumps?

I've noticed some strange behavior on Python 3's implementation of json.dumps, namely the key order changes every time I dump the same object from execution to execution. Googling wasn't working since I don't care about sorting the keys, I just want them to remain the same! Here is an example script:

import json

data = {
    'number': 42,
    'name': 'John Doe',
    'email': '[email protected]',
    'balance': 235.03,
    'isadmin': False,
    'groceries': [
        'apples',
        'bananas',
        'pears',
    ],
    'nested': {
        'complex': True,
        'value': 2153.23412
    }
}

print(json.dumps(data, indent=2))

When I run this script I get different outputs every time, for example:

$ python print_data.py 
{
  "groceries": [
    "apples",
    "bananas",
    "pears"
  ],
  "isadmin": false,
  "nested": {
    "value": 2153.23412,
    "complex": true
  },
  "email": "[email protected]",
  "number": 42,
  "name": "John Doe",
  "balance": 235.03
}

But then I run it again and I get:

$ python print_data.py 
{
  "email": "[email protected]",
  "balance": 235.03,
  "name": "John Doe",
  "nested": {
    "value": 2153.23412,
    "complex": true
  },
  "isadmin": false,
  "groceries": [
    "apples",
    "bananas",
    "pears"
  ],
  "number": 42
}

I understand that dictionaries are unordered collections and that the order is based on a hash function; however in Python 2 - the order (whatever it is) is fixed and doesn't change on a per-execution basis. The difficulty here is that it's making my tests difficult to run because I need to compare the JSON output of two different modules!

Any idea what is going on? How to fix it? Note that I would like to avoid using an OrderedDict or performing any sorting and what matters is that the string representation remains the same between executions. Also this is for testing purposes only and doesn't have any effect on the implementation of my module.

Upvotes: 14

Views: 8553

Answers (3)

paulie4
paulie4

Reputation: 502

This behavior changed in Python 3.7. The json documentation says this:

Prior to Python 3.7, dict was not guaranteed to be ordered, so inputs and outputs were typically scrambled unless collections.OrderedDict was specifically requested. Starting with Python 3.7, the regular dict became order preserving, so it is no longer necessary to specify collections.OrderedDict for JSON generation and parsing.

Upvotes: 1

Smit Johnth
Smit Johnth

Reputation: 2688

The story behind this behavior is this vulnerability. To prevent it, same hash codes on one PC should be different on another one.

Python 2 has probably disabled this behavior (hash randomizing) by default because of compatibility, as this would for example break doctests. Python 3 probably (an assumption) has not needed the compability.

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1121644

Python dictionaries and JSON objects are unordered. You can ask json.dumps() to sort the keys in the output; this is meant to ease testing. Use the sort_keys parameter to True:

print(json.dumps(data, indent=2, sort_keys=True))

See Why is the order in Python dictionaries and sets arbitrary? as to why you see a different order each time.

You can set the PYTHONHASHSEED environment variable to an integer value to 'lock' the dictionary order; use this only to run tests and not in production, as the whole point of hash randomisation is to prevent an attacker from trivially DOS-ing your program.

Upvotes: 20

Related Questions