Room'on
Room'on

Reputation: 93

How to parse JSON when there are NULL values inside?

I'm trying to parse JSON data, but when I have NULL in some branches of JSON Python gives me an error: TypeError: 'NoneType' object is not subscriptable.

This situation is OK:

import json

x = '''[{"address":{"city": "city1","street": "street1"}},
        {"address":{"city": "city2","street": "street2"}}]'''
source = json.loads(x)
data = []
for s in source:
    data.append([s['address']['city'],
                 s['address']['street']])
print(data)

And this one gives me an error:

import json

x = '''[{"address":{"city": "city1","street": "street1"}},
        {"address": null},
        {"address":{"city": "city2","street": "street2"}}]'''
source = json.loads(x)
data = []
for s in source:
    data.append([s['address']['city'],
                 s['address']['street']])
print(data)

I would like to get NULL (None) values in the second case. What is the shortest way to do it?

Update #1: I have a lot of other data, not only "address" and any of them can also be NULL. That is why I can't use "if statements" (there are will be too many different combinations)

Update #2: To make my question more clear (in real case I have 25 different parameters, not 3 as below):

[
    {
        "address": {
            "city": "city1",
            "street": "street1"
        },
        "car": null,
        "person": {
            "age": "30",
            "name": "John"
        }
    },
    {
        "address": null,
        "car": {
            "color": "red",
            "year": "2015"
        },
        "person": {
            "age": "31",
            "name": "Peter"
        }
    },
    {
        "address": {
            "city": "city2",
            "street": "street2"
        },
        "car": {
            "color": "green",
            "year": "2017"
        },
        "person": null
    }
]

    data.append(   [s['address']['city'],
                    s['address']['street'],
                    s['person']['name'],
                    s['paerson']['age'],
                    s['car']['year'],
                    s['car']['color']])

Upvotes: 0

Views: 4737

Answers (4)

martineau
martineau

Reputation: 123463

Here's a generalized way to handle the situation when you have JSON objects nested one-level deep that might have NULL values. It makes use of the optional object_hook= keyword argument to pass a callback function to json.loads() (as does json.load()). In this case, the function converts any None values in the upper-level dicts into empty NoneDict dictionary subclass instances.

NoneDicts simply return None as the value of missing keys instead of raising KeyErrors. Optimization note: If you never change these objects — i.e. they're read-only — you really only need create is a single global instance and always use it in the convertor() function.

import json
from pprint import pprint


class NoneDict(dict):
    """ dict subclass that returns a value of None for missing keys instead
        of raising a KeyError. Note: doesn't add item to dictionary.
    """
    def __missing__(self, key):
        return None


def converter(decoded_dict):
    """ Convert any None values in decoded dict into empty NoneDict's. """
    return {k: NoneDict() if v is None else v for k,v in decoded_dict.items()}

# The following JSON data is equivalent to what you have in Update #2 of your
# question, it's just formatted more compactly.
x = '''
    [{"address": {"city": "city1", "street": "street1"},
      "car": null,
      "person": {"age": "30", "name": "John"}},
     {"address": null,
      "car": {"color": "red", "year": "2015"},
      "person": {"age": "31", "name": "Peter"}},
     {"address": {"city": "city2", "street": "street2"},
      "car": {"color": "green", "year": "2017"},
      "person": null}]
'''

source = json.loads(x, object_hook=converter)
data = []

for s in source:
    data.append([s['address']['city'],
                 s['address']['street'],
                 s['person']['name'],
                 s['person']['age'],
                 s['car']['year'],
                 s['car']['color']])

pprint(data)

Output:

[['city1', 'street1', 'John', '30', None, None],
 [None, None, 'Peter', '31', '2015', 'red'],
 ['city2', 'street2', None, None, '2017', 'green']]

Note that the part near the very end could be written like this to make it more "data-driven":

items = (('address', 'city'),
         ('address', 'street'),
         ('person', 'name'),
         ('person', 'age'),
         ('car', 'year'),
         ('car', 'color'))

for s in source:
    data.append([s[k1][k2] for k1, k2 in items])

Upvotes: 2

Omar
Omar

Reputation: 149

The problem is that in the second case s['address'] evaluates to None and it's not subscriptable. You should check that the value is not None and handle that case separately:

import json

x = '''[{"address":{"city": "city1","street": "street1"}},
        {"address": null},
        {"address":{"city": "city2","street": "street2"}}]'''                          
source = json.loads(x)
data = []
for s in source:
    if s['address'] is not None:
        data.append([s['address']['city'],
                     s['address']['street']])
    else:
        data.append(None)
print(data)

This will print: [['city1', 'street1'], None, ['city2', 'street2']]

Edit: Try this:

import pandas as pd
df = pd.io.json.json_normalize(source)
df = df.where((pd.notnull(df)), None)
data = df[[column for column in df.columns if '.' in column]]
print(data.values.tolist())

Output: [['city1', 'street1', None, None, '30', 'John'], [None, None, 'red', '2015', '31', 'Peter'], ['city2', 'street2', 'green', '2017', None, None]]

Upvotes: 1

I_Adze
I_Adze

Reputation: 89

You'll have to check if address is none before trying to access things from it. For example:

for s in source:
    if s['address']:
        data.append([s['address']['city]',s['address']['street']])
    else:
        # whatever behaviour you want for None values

Upvotes: 1

ForceBru
ForceBru

Reputation: 44838

Handle the None case separately:

for s in source:
    address = s['address']

    data.append(
       [None, None] if address is None
       else [address['city'], address['street']]
    )

Upvotes: 1

Related Questions