Reputation: 85
I have a dataset with lots of NaN values, and I would like to fill it based on other column's value. Here is an example.
Ind Init Desc
1 A Apple
2 A Apple
3 A NaN
4 B NaN
5 B Banana
6 B Banana
7 C Cherry
8 C NaN
9 C Cherry
10 D NaN
11 D NaN
12 D NaN
13 A NaN
14 A NaN
15 A Apple
I cannot just simply use df.fillna('apple')
because it has to be dynamic. I also cannot use neither of (method='ffill')
and (method='bfill')
because, in case of A, it should be ffill
, and in case of B, it should be bfill
. Also in case of D, it should be saying 'No fruit description available!'
You may assume there is no missing Init, and there is only one fruit description per one unique Init.
What would be the best way to handle this case?
Upvotes: 1
Views: 536
Reputation: 9481
Something like this?
mapping_dict = {'A': 'Apple', 'B': 'Banana', 'C':'Cherry', 'D':'no fruit description available'}
df['Desc'] = df['Init'].map(mapping_dict)
Upvotes: 1
Reputation: 75100
you can use something like:
df['Desc1']=(df.groupby('Init')['Desc'].apply
(lambda x: x.ffill().bfill()).fillna('No fruit description available!'))
print(df)
Ind Init Desc Desc1
0 1 A Apple Apple
1 2 A Apple Apple
2 3 A NaN Apple
3 4 B NaN Banana
4 5 B Banana Banana
5 6 B Banana Banana
6 7 C Cherry Cherry
7 8 C NaN Cherry
8 9 C Cherry Cherry
9 10 D NaN No fruit description available!
10 11 D NaN No fruit description available!
11 12 D NaN No fruit description available!
12 13 A NaN Apple
13 14 A NaN Apple
14 15 A Apple Apple
Upvotes: 2