Talal Ghannam
Talal Ghannam

Reputation: 189

Replacing NaN values with group mean

I have s dataframe made of countries, years and many other features. there are many years for a single country

country  year population.....  etc.
1        2000   5000
1        2001    NaN
1        2002   4800
2        2000

now there are many NaN in the dataframe. I want to replace each NaN corresponding to a specific country in every column with the country average of this column.

so for example for the NaN in the population column corresponding to country 1, year 2001, I want to use the average population for country 1 for all the years = (5000+4800)/2. now I am using the groupby().mean() method to find the means for each country, but I am running into the following difficulties: 1- some means are coming as NaN when I know for sure there is a value for it. why is so? 2- how can I get access to specific values in the groupby clause? in other words, how can I replace every NaN with its correct average?

Thanks a lot.

Upvotes: 0

Views: 1135

Answers (1)

BENY
BENY

Reputation: 323306

Using combine_first with groupby mean

df.combine_first(df.groupby('country').transform('mean'))

Or

df.fillna(df.groupby('country').transform('mean'))

Upvotes: 2

Related Questions