Aman Singh
Aman Singh

Reputation: 1241

mapping missing values in one column of pandas dataframe using dictionary with reference to another column values

I have a dataframe as

> print(df)
[Out:]
activity-code    activity
-------------------------
0                unknown
99               NaN
84               sports
72;99            NaN
57               recreational
57;99;11         NaN
11               NaN

and a dictionary with activity-codes as keys,

> print(act_dict)
[Out:]
{10: 'unknown',
11: 'cultural',
57: 'recreational',
72: 'social service',
84: 'sports',
99: 'education'}

All the values inside the dataframe are stored as strings even the activity-code has values as string. Whereas the dictionary keys are of integer type I want to somehow map and replace with missing values in activity using the dictionary with reference to the values stored in activity-code column. So the desired output dataframe should be something like this,

> print(df)
[Out:]
activity-code    activity
-------------------------
0                unknown
99               education
84               sports
72;99            social service;education
57               recreational
57;99;11         recreational;education;cultural
11               cultural

This is what I've tried so far,

df['new-activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))

but I'm getting KeyError for single values where the activity-codes aren't single code values. The error says KeyError: 0

How do i map the dictionary values to the missing values in activity column of dataframe?

Upvotes: 1

Views: 994

Answers (2)

Loochie
Loochie

Reputation: 2472

Well in case there is no values against 0 in your dictionary you can use filter():

df['activity']= df['activity-code'].apply(lambda x:'; '.join(list(filter(None,map(act_dict.get,list(map(int,x.split(';'))))))))

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71610

Use apply and str.split, than in apply, use a list comprehension and join it by ';':

df['activity'] = df['activity-code'].str.split(';').apply(lambda x: ';'.join([act_dict[int(i)] for i in x]))

And now:

print(df)

Output:

  activity-code                         activity
0             0                          unknown
1            99                        education
2            84                           sports
3         72;99         social service;education
4            57                     recreational
5      57;99;11  recreational;education;cultural
6            11                         cultural

Upvotes: 2

Related Questions