David Pepper
David Pepper

Reputation: 593

Pandas expand json field across records

I have an interesting problem, and I'm wondering if there's a concise, pythonic (pandastic?) way to do this, rather than iterating over rows of a data frame.

Take a DataFrame with one field that is a json encoding of information:

    Name      Data
0   Joe       '[{"label":"a","value":"1"},{"label":"b","value":"2"}]'
1   Sue       '[{"label":"a","value":"3"},{"label":"c","value":"4"}]'
2   Bob       '[{"label":"b","value":"4"},{"label":"d","value":"1"}]'

I want to expand the json field to be data fields, unioning the different column headers, to get this:

    Name      Data                 a    b    c    d
0   Joe       '[{"label":"a"...    1    2    
1   Sue       '[{"label":"a"...    3         4
2   Bob       '[{"label":"b"...         4         1

The blanks are missing values. I know I can use read_json to create data frames from the json field, but then I want to re-flatten these data frames into extra columns of the original data set.

So, is there an elegant way to do this without iterating over the various rows of the data frame? Any help would be appreciated.

Upvotes: 21

Views: 26352

Answers (1)

unutbu
unutbu

Reputation: 879691

Given

In [96]: df
Out[96]: 
  Name                   Data
0  Joe  [{"a":"1"},{"b":"2"}]
1  Sue  [{"a":"3"},{"c":"4"}]
2  Bob  [{"b":"4"},{"d":"1"}]

if you define

import json
def json_to_series(text):
    keys, values = zip(*[item for dct in json.loads(text) for item in dct.items()])
    return pd.Series(values, index=keys)

then

In [97]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [98]: result
Out[98]: 
  Name                   Data    a    b    c    d
0  Joe  [{"a":"1"},{"b":"2"}]    1    2  NaN  NaN
1  Sue  [{"a":"3"},{"c":"4"}]    3  NaN    4  NaN
2  Bob  [{"b":"4"},{"d":"1"}]  NaN    4  NaN    1

Given

In [22]: df
Out[22]: 
  Name                                               Data
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...

if you define

def json_to_series(text):
    keys, values = zip(*[(dct['label'], dct['value']) for dct in json.loads(text)])
    return pd.Series(values, index=keys)

then

In [20]: result = pd.concat([df, df['Data'].apply(json_to_series)], axis=1)

In [21]: result
Out[21]: 
  Name                                               Data    a    b    c    d
0  Joe  [{"label":"a","value":"1"},{"label":"b","value...    1    2  NaN  NaN
1  Sue  [{"label":"a","value":"3"},{"label":"c","value...    3  NaN    4  NaN
2  Bob  [{"label":"b","value":"4"},{"label":"d","value...  NaN    4  NaN    1

References:

Upvotes: 28

Related Questions