Reputation:
I am loading a json file with nested values. Here is how it loads:
>>> pd.read_json('/Users/david/Desktop/validate_headers/json/example_array2.json')
address firstname lastname zip_code
0 {'state': 'MI', 'town': 'Dearborn'} Jimmie Barninger 12345
1 {'state': 'CA', 'town': 'Los Angeles'} John Doe 90027
I would like to flatten the nested object, so that my end dataframe looks like this:
firstname lastname zip_code address.state address.town
Jimmie Barninger 12345 MI Dearborn
John Doe 90027 CA Los Angeles
How would I do this, that is, if the dataframe column is an object, split that column up into multiple columns (and do so until there are no json objects left)?
Upvotes: 4
Views: 3965
Reputation: 11
Use below (reference)
from pandas.io.json import json_normalize
Upvotes: 1
Reputation: 3967
Here's an approach using apply
:
df['city'] = df.address.apply(lambda x: x.split(',')[0].split(':')[1].replace("'","").replace("}",""))
df['state'] = df.address.apply(lambda x: x.split(',')[1].split(':')[1].replace("'","").replace("}",""))
df.drop(columns=['address'], inplace=True)
Upvotes: 1
Reputation: 20848
much simpler:
df = pd.DataFrame({'address': [{'state': 'MI', 'town': 'Dearborn'} , {'state': 'CA', 'town': 'Los Angeles'}], 'name':['John', 'Jane']})
df = df.join(df['address'].apply(pd.Series))
then
df.drop(columns='address')
Upvotes: 8
Reputation: 75150
if your address
column is not a dictionary, you can convert to one by:
import ast
df.address = [ast.literal_eval(df.address[i]) for i in df.index]
then :
df.address.apply(pd.Series)
state town
0 MI Dearborn
1 CA Los Angeles
Though not sure about the length of your dataset, this can also be achieved by:
def literal_return(val):
try:
return ast.literal_eval(val)
except (ValueError, SyntaxError) as e:
return val
df.address.apply(literal_return)
>>%timeit [ast.literal_eval(df.address[i]) for i in df.index]
144 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>%timeit df.address.apply(literal_return)
454 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 3