Sergey Panyushkin
Sergey Panyushkin

Reputation: 75

Create new columns from json-column

I have a dataframe with column: event_name with json-objects (different types of object). I want split this column to new columns (as in json objects).

Create df:

d = [{'event_datetime': '2019-01-08 00:09:30',
  'event_json': '{"lvl":"450","tok":"1212","snum":"257","udid":"122112"}',
  'event_name': 'AdsClick'},
 {'event_datetime': '2019-01-08 00:43:21',
  'event_json': '{"lvl":"902","udid":"3123","tok":"4214","snum":"1387"}',
  'event_name': 'AdsClick'},
 {'event_datetime': '2019-02-08 00:05:01',
  'event_json': '{"lvl":"1415","udid":"214124","tok":"213123","snum":"2416","col12":"2416","col13":"2416"}'}]

df12 = json_normalize(d)

Sample:

event_datetime  event_json  event_name
0   2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...   AdsClick
1   2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...   AdsClick
2   2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...   NaN

Now I use this code:

df12 = df12.merge(df12['event_json'].apply(lambda x: pd.Series(json.loads(x))), left_index=True, right_index=True)

Result:

event_datetime  event_json  event_name  lvl snum    tok udid    col12   col13
0   2019-02-08 00:09:30 {"lvl":"450","tok":"1212","snum":"257","udid":...   AdsClick    450 257 1212    122112  NaN NaN
1   2019-02-08 00:43:21 {"lvl":"902","udid":"3123","tok":"4214","snum"...   AdsClick    902 1387    4214    3123    NaN NaN
2   2019-02-08 00:05:01 {"lvl":"1415","udid":"214124","tok":"213123","...   NaN 1415    2416    213123  214124  2416    2416

But it is very slow. Have you any idea for faster code?

Upvotes: 1

Views: 1388

Answers (1)

jezrael
jezrael

Reputation: 862541

Use list comprehension with DataFrame constructor and add to original by DataFrame.join:

df = df12.join(pd.DataFrame([json.loads(x) for x in df12['event_json']]))
print (df)
        event_datetime                                         event_json  \
0  2019-01-08 00:09:30  {"lvl":"450","tok":"1212","snum":"257","udid":...   
1  2019-01-08 00:43:21  {"lvl":"902","udid":"3123","tok":"4214","snum"...   
2  2019-02-08 00:05:01  {"lvl":"1415","udid":"214124","tok":"213123","...   

  event_name col12 col13   lvl  snum     tok    udid  
0   AdsClick   NaN   NaN   450   257    1212  122112  
1   AdsClick   NaN   NaN   902  1387    4214    3123  
2        NaN  2416  2416  1415  2416  213123  214124  

If also is necessary remove source column use DataFrame.pop:

df = df12.join(pd.DataFrame([json.loads(x) for x in df12.pop('event_json')]))
print (df)
        event_datetime event_name col12 col13   lvl  snum     tok    udid
0  2019-01-08 00:09:30   AdsClick   NaN   NaN   450   257    1212  122112
1  2019-01-08 00:43:21   AdsClick   NaN   NaN   902  1387    4214    3123
2  2019-02-08 00:05:01        NaN  2416  2416  1415  2416  213123  214124

Upvotes: 1

Related Questions