user10332687
user10332687

Reputation:

How to flatten a json field in pandas

I am loading a json file with nested values. Here is how it loads:

>>> pd.read_json('/Users/david/Desktop/validate_headers/json/example_array2.json')
                                  address firstname   lastname  zip_code
0     {'state': 'MI', 'town': 'Dearborn'}    Jimmie  Barninger     12345
1  {'state': 'CA', 'town': 'Los Angeles'}      John        Doe     90027

I would like to flatten the nested object, so that my end dataframe looks like this:

firstname   lastname    zip_code    address.state   address.town
Jimmie      Barninger   12345       MI              Dearborn
John        Doe         90027       CA              Los Angeles

How would I do this, that is, if the dataframe column is an object, split that column up into multiple columns (and do so until there are no json objects left)?

Upvotes: 4

Views: 3965

Answers (4)

Alexandre Savio
Alexandre Savio

Reputation: 11

Use below (reference)

from pandas.io.json import json_normalize

Upvotes: 1

meW
meW

Reputation: 3967

Here's an approach using apply:

df['city'] = df.address.apply(lambda x: x.split(',')[0].split(':')[1].replace("'","").replace("}",""))
df['state'] = df.address.apply(lambda x: x.split(',')[1].split(':')[1].replace("'","").replace("}",""))
df.drop(columns=['address'], inplace=True)

Upvotes: 1

MrE
MrE

Reputation: 20848

much simpler:

df = pd.DataFrame({'address': [{'state': 'MI', 'town': 'Dearborn'} , {'state': 'CA', 'town': 'Los Angeles'}], 'name':['John', 'Jane']})

df = df.join(df['address'].apply(pd.Series))

then

df.drop(columns='address')

Upvotes: 8

anky
anky

Reputation: 75150

if your address column is not a dictionary, you can convert to one by:

import ast
df.address = [ast.literal_eval(df.address[i]) for i in df.index]

then :

df.address.apply(pd.Series)

    state   town
0   MI  Dearborn
1   CA  Los Angeles

Though not sure about the length of your dataset, this can also be achieved by:

def literal_return(val):
try:
    return ast.literal_eval(val)
except (ValueError, SyntaxError) as e:
    return val
df.address.apply(literal_return)

>>%timeit [ast.literal_eval(df.address[i]) for i in df.index]
144 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>%timeit df.address.apply(literal_return)
454 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 3

Related Questions