Reputation: 93
I'm trying to parse JSON data, but when I have NULL in some branches of JSON Python gives me an error:
TypeError: 'NoneType' object is not subscriptable
.
This situation is OK:
import json
x = '''[{"address":{"city": "city1","street": "street1"}},
{"address":{"city": "city2","street": "street2"}}]'''
source = json.loads(x)
data = []
for s in source:
data.append([s['address']['city'],
s['address']['street']])
print(data)
And this one gives me an error:
import json
x = '''[{"address":{"city": "city1","street": "street1"}},
{"address": null},
{"address":{"city": "city2","street": "street2"}}]'''
source = json.loads(x)
data = []
for s in source:
data.append([s['address']['city'],
s['address']['street']])
print(data)
I would like to get NULL (None) values in the second case. What is the shortest way to do it?
Update #1:
I have a lot of other data, not only "address" and any of them can also be NULL. That is why I can't use "if
statements" (there are will be too many different combinations)
Update #2: To make my question more clear (in real case I have 25 different parameters, not 3 as below):
[
{
"address": {
"city": "city1",
"street": "street1"
},
"car": null,
"person": {
"age": "30",
"name": "John"
}
},
{
"address": null,
"car": {
"color": "red",
"year": "2015"
},
"person": {
"age": "31",
"name": "Peter"
}
},
{
"address": {
"city": "city2",
"street": "street2"
},
"car": {
"color": "green",
"year": "2017"
},
"person": null
}
]
data.append( [s['address']['city'],
s['address']['street'],
s['person']['name'],
s['paerson']['age'],
s['car']['year'],
s['car']['color']])
Upvotes: 0
Views: 4737
Reputation: 123463
Here's a generalized way to handle the situation when you have JSON objects nested one-level deep that might have NULL values. It makes use of the optional object_hook=
keyword argument to pass a callback function to json.loads()
(as does json.load()
). In this case, the function converts any None
values in the upper-level dict
s into empty NoneDict
dictionary subclass instances.
NoneDict
s simply return None
as the value of missing keys instead of raising KeyError
s. Optimization note: If you never change these objects — i.e. they're read-only — you really only need create is a single global instance and always use it in the convertor()
function.
import json
from pprint import pprint
class NoneDict(dict):
""" dict subclass that returns a value of None for missing keys instead
of raising a KeyError. Note: doesn't add item to dictionary.
"""
def __missing__(self, key):
return None
def converter(decoded_dict):
""" Convert any None values in decoded dict into empty NoneDict's. """
return {k: NoneDict() if v is None else v for k,v in decoded_dict.items()}
# The following JSON data is equivalent to what you have in Update #2 of your
# question, it's just formatted more compactly.
x = '''
[{"address": {"city": "city1", "street": "street1"},
"car": null,
"person": {"age": "30", "name": "John"}},
{"address": null,
"car": {"color": "red", "year": "2015"},
"person": {"age": "31", "name": "Peter"}},
{"address": {"city": "city2", "street": "street2"},
"car": {"color": "green", "year": "2017"},
"person": null}]
'''
source = json.loads(x, object_hook=converter)
data = []
for s in source:
data.append([s['address']['city'],
s['address']['street'],
s['person']['name'],
s['person']['age'],
s['car']['year'],
s['car']['color']])
pprint(data)
Output:
[['city1', 'street1', 'John', '30', None, None],
[None, None, 'Peter', '31', '2015', 'red'],
['city2', 'street2', None, None, '2017', 'green']]
Note that the part near the very end could be written like this to make it more "data-driven":
items = (('address', 'city'),
('address', 'street'),
('person', 'name'),
('person', 'age'),
('car', 'year'),
('car', 'color'))
for s in source:
data.append([s[k1][k2] for k1, k2 in items])
Upvotes: 2
Reputation: 149
The problem is that in the second case s['address'] evaluates to None and it's not subscriptable. You should check that the value is not None and handle that case separately:
import json
x = '''[{"address":{"city": "city1","street": "street1"}},
{"address": null},
{"address":{"city": "city2","street": "street2"}}]'''
source = json.loads(x)
data = []
for s in source:
if s['address'] is not None:
data.append([s['address']['city'],
s['address']['street']])
else:
data.append(None)
print(data)
This will print: [['city1', 'street1'], None, ['city2', 'street2']]
Edit: Try this:
import pandas as pd
df = pd.io.json.json_normalize(source)
df = df.where((pd.notnull(df)), None)
data = df[[column for column in df.columns if '.' in column]]
print(data.values.tolist())
Output: [['city1', 'street1', None, None, '30', 'John'], [None, None, 'red', '2015', '31', 'Peter'], ['city2', 'street2', 'green', '2017', None, None]]
Upvotes: 1
Reputation: 89
You'll have to check if address is none before trying to access things from it. For example:
for s in source:
if s['address']:
data.append([s['address']['city]',s['address']['street']])
else:
# whatever behaviour you want for None values
Upvotes: 1
Reputation: 44838
Handle the None
case separately:
for s in source:
address = s['address']
data.append(
[None, None] if address is None
else [address['city'], address['street']]
)
Upvotes: 1