Anubhav Dikshit
Anubhav Dikshit

Reputation: 1829

parse a json in a column of dataframe

I have a dataframe(df) with the following structure.

Input:

ID data
1 {"customerinfo":{zip:834247}}
2 {"score":535,"customerinfo":{zip:834244}}

I wanted to parse the json data as a new columns, such as to get the following:

Expected Output:

ID zipcode     score
1  834247      NULL
2  834244      535

Original Solution:

df['bureauScore'] = df['data'].transform(lambda x: json.loads(x)['score'])
df['zipcode'] = df['data'].transform(lambda x: json.loads(x)['customerinfo']['zipcode'])

Problem:

If a field is missing the code fails, thus I tired to add a get function as shown below, but that fails here with the following error

Attempted Solution

df['bureauScore'] = df['data'].transform(lambda x: json.loads(get(x))['score'])

Error:

NameError: name 'get' is not defined

Any help would be appreciated.

P.S: I know about json_normalize but I have multiple fields, thus wanted this approach

Upvotes: 1

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 862511

Use Series.str.get - it return NaN if no match:

#convert to jsons only once to helper Series s
s = df['data'].transform(json.loads)

df['bureauScore'] = s.str.get('score')
df['zipcode'] = s.str.get('customerinfo').str.get('zip')

print (df)

   ID                                         data  bureauScore  zipcode
0   1              {"customerinfo":{"zip":834247}}          NaN   834247
1   2  {"score":535,"customerinfo":{"zip":834244}}        535.0   834244

Upvotes: 1

Related Questions