Reputation: 105
I have this dataset. You can see there's a null Value
for the Product
Wheat
. I want to fill this null Value
by the mean of the category of that product. So for wheat this is category A
which have mean 5+8/2=6.5
.
Product Value Category
0 Rice 5 A
1 Corn 8 A
2 Milk 17 B
3 Wheat NaN A
4 Ice cream 3 B
Here's what I have tried.
df[Value].fillna(df.groupby('Category')[Value].mean(),inplace=True)
But this is not working. As it returns means of each category. So how can I achieve this? Thanks for your help.
Upvotes: 0
Views: 934
Reputation: 8768
Try:
df['Value'] = df.groupby('Category')['Value'].transform(lambda x: x.fillna(x.mean()))
Upvotes: 2
Reputation: 27
data1 = data.groupby('Category')['Value'].mean().reset_index()
data1.columns = ['Category','Mean']
data = data.join(data1.set_index('Category'), on='Category')
data['Value'] = data['Value'].fillna(data['Mean'])
data=data.drop('Mean',axis=1)
data
Upvotes: 0
Reputation: 46
You can use GroupBy
+ transform
to fill NaN
values with groupwise means.
df['value'] = df['value'].fillna(df.groupby('Category')['value'].transform('mean'))
df['value'] = df['value'].fillna(df['value'].mean())
df
Upvotes: 1
Reputation: 16327
Try this,
for index, row in df.iterrows():
product = row['product']
value = row['value']
category = row['category']
if np.isnan(value):
category_mean = df.groupby('category')['value'].mean()[category]
print(f'category_mean : {category_mean}')
df.loc[index, 'value'] = category_mean
else:
print(product, value, category)
https://github.com/biranchi2018/My_ML_Examples/blob/master/16.Stackoverflow_Pandas.ipynb
Upvotes: 1