Reputation: 109
I have a df
with multiple columns.
df = pd.DataFrame({'Store':['M1','M2','M3','M1','M1','M2','M2','M3','M3'],
'Category':['A','A','A','B','B','B','C','C','C'],
'Price_Category':[np.nan,X,np.nan,np.nan,Y,Y,Z,np.nan,Z]})
How do I fill the NaN
on Price_Category
with the mode, based only on Category
?
I tried to use:
df['Price_Category'] = df.groupby('Category')['Price_Category'].apply(lambda x: x.fillna(x.mode()[0]))
But I get this error: KeyError: 0
Other ways that I tried, started to fill the NaN
with the name from the Store
.
Thanks for your help!
Upvotes: 4
Views: 480
Reputation: 863291
Use Series.iat
for first value of Series
by position:
f = lambda x: x.fillna(x.mode().iat[0])
df['Price_Category'] = df.groupby('Category')['Price_Category'].apply(f)
print (df)
Store Category Price_Category
0 M1 A X
1 M2 A X
2 M3 A X
3 M1 B Y
4 M1 B Y
5 M2 B Y
6 M2 C Z
7 M3 C Z
8 M3 C Z
If error:
IndexError: index 0 is out of bounds for axis 0 with size 0
it means only NaNs one or more groups (changed data for raise error):
df = pd.DataFrame({'Store':['M1','M2','M3','M1','M1','M2','M2','M3','M3'],
'Category':['A','A','A','B','B','B','C','C','C'],
'Price_Category':[np.nan,'X',np.nan,np.nan,'Y','Y',np.nan,np.nan,np.nan]})
#test if at least one non NaN else return same values (NaNs)
f = lambda x: x.fillna(x.mode().iat[0]) if x.notna().any() else x
df['Price_Category'] = df.groupby('Category')['Price_Category'].apply(f)
print (df)
Store Category Price_Category
0 M1 A X
1 M2 A X
2 M3 A X
3 M1 B Y
4 M1 B Y
5 M2 B Y
6 M2 C NaN
7 M3 C NaN
8 M3 C NaN
Upvotes: 3