Reputation: 1829
I have a dataframe(df) with the following structure.
Input:
ID data
1 {"customerinfo":{zip:834247}}
2 {"score":535,"customerinfo":{zip:834244}}
I wanted to parse the json data as a new columns, such as to get the following:
Expected Output:
ID zipcode score
1 834247 NULL
2 834244 535
Original Solution:
df['bureauScore'] = df['data'].transform(lambda x: json.loads(x)['score'])
df['zipcode'] = df['data'].transform(lambda x: json.loads(x)['customerinfo']['zipcode'])
Problem:
If a field is missing the code fails, thus I tired to add a get function as shown below, but that fails here with the following error
Attempted Solution
df['bureauScore'] = df['data'].transform(lambda x: json.loads(get(x))['score'])
Error:
NameError: name 'get' is not defined
Any help would be appreciated.
P.S: I know about json_normalize but I have multiple fields, thus wanted this approach
Upvotes: 1
Views: 45
Reputation: 862511
Use Series.str.get
- it return NaN
if no match:
#convert to jsons only once to helper Series s
s = df['data'].transform(json.loads)
df['bureauScore'] = s.str.get('score')
df['zipcode'] = s.str.get('customerinfo').str.get('zip')
print (df)
ID data bureauScore zipcode
0 1 {"customerinfo":{"zip":834247}} NaN 834247
1 2 {"score":535,"customerinfo":{"zip":834244}} 535.0 834244
Upvotes: 1