WhiteHotLoveTiger
WhiteHotLoveTiger

Reputation: 2228

Formatting JSON in Python

What is the simplest way to pretty-print a string of JSON as a string with indentation when the initial JSON string is formatted without extra spaces or line breaks?

Currently I'm running json.loads() and then running json.dumps() with indent=2 on the result. This works, but it feels like I'm throwing a lot of compute down the drain.

Is there a more simple or efficient (built-in) way to pretty-print a JSON string? (while keeping it as valid JSON)

Example

import requests
import json

response = requests.get('http://spam.eggs/breakfast')
one_line_json = response.content.decode('utf-8')
pretty_json = json.dumps(json.loads(response.content), indent=2)

print(f'Original: {one_line_json}')
print(f'Pretty: {pretty_json}')

Output:

Original: {"breakfast": ["spam", "spam", "eggs"]}
Pretty: {
  "breakfast": [
    "spam", 
    "spam", 
    "eggs"
    ]
}

Upvotes: 5

Views: 26031

Answers (2)

Scott Skiles
Scott Skiles

Reputation: 3847

json.dumps(obj, indent=2) is better than pprint because:

  1. It is faster with the same load methodology.
  2. It has the same or similar simplicity.
  3. The output will produce valid JSON, whereas pprint will not.

pprint_vs_dumps.py

import cProfile
import json
import pprint
from urllib.request import urlopen


def custom_pretty_print():
    url_to_read = "https://www.cbcmusic.ca/Component/Playlog/GetPlaylog?stationId=96&date=2018-11-05"
    with urlopen(url_to_read) as resp:
        pretty_json = json.dumps(json.load(resp), indent=2)
    print(f'Pretty: {pretty_json}')


def pprint_json():
    url_to_read = "https://www.cbcmusic.ca/Component/Playlog/GetPlaylog?stationId=96&date=2018-11-05"
    with urlopen(url_to_read) as resp:
        info = json.load(resp)
    pprint.pprint(info)


cProfile.run('custom_pretty_print()')
>>> 71027 function calls (42309 primitive calls) in 0.084 seconds

cProfile.run('pprint_json()')
>>>164241 function calls (140121 primitive calls) in 0.208 seconds

Thanks @tobias_k for pointing out my errors along the way.

Upvotes: 11

r.ook
r.ook

Reputation: 13858

I think for a true JSON object print, it's probably as good as it gets. timeit(number=10000) for the following took about 5.659214497s:

import json
d = {   
        'breakfast': [
            'spam', 'spam', 'eggs', 
            {
                'another': 'level', 
                'nested': [
                    {'a':'b'}, 
                    {'c':'d'}
                ]
            }
        ], 
        'foo': True,
        'bar': None
    }
s = json.dumps(d)
q = json.dumps(json.loads(s), indent=2)
print(q)

I tried with pprint, but it actually wouldn't print the pure JSON string unless it's converted to a Python dict, which loses your true, null and false etc valid JSON as mentioned in the other answer. As well it doesn't retain the order in which the items appeared, so it's not great if order is important for readability.

Just for fun I whipped up the following function:

def pretty_json_for_savages(j, indentor='  '):
    ind_lvl = 0
    temp = ''
    for i, c in enumerate(j):
        if c in '{[':
            print(indentor*ind_lvl + temp.strip() + c)
            ind_lvl += 1
            temp = ''
        elif c in '}]':
            print(indentor*ind_lvl + temp.strip() + '\n' + indentor*(ind_lvl-1) + c, end='')
            ind_lvl -= 1
            temp = ''
        elif c in ',':
            print(indentor*(0 if j[i-1] in '{}[]' else ind_lvl) + temp.strip() + c)
            temp = ''
        else:
            temp += c
    print('')

# {
#   "breakfast":[
#     "spam",
#     "spam",
#     "eggs",
#     {
#       "another": "level",
#       "nested":[
#         {
#           "a": "b"
#         },
#         {
#           "c": "d"
#         }        
#       ]      
#     }    
#   ],
#   "foo": true,
#   "bar": null
# }

It prints pretty alright, and unsurprisingly it took a whooping 16.701202023s to run in timeit(number=10000), which is 3 times as much as a json.dumps(json.loads()) would get you. It's probably not worthwhile to build your own function to achieve this unless you spend some time to optimize it, and with the lack of a builtin for the same, it's probably best you stick with your gun since your efforts will most likely give diminishing returns.

Upvotes: 3

Related Questions