Reputation: 189
I have s dataframe made of countries, years and many other features. there are many years for a single country
country year population..... etc.
1 2000 5000
1 2001 NaN
1 2002 4800
2 2000
now there are many NaN in the dataframe. I want to replace each NaN corresponding to a specific country in every column with the country average of this column.
so for example for the NaN in the population column corresponding to country 1, year 2001, I want to use the average population for country 1 for all the years = (5000+4800)/2. now I am using the groupby().mean() method to find the means for each country, but I am running into the following difficulties: 1- some means are coming as NaN when I know for sure there is a value for it. why is so? 2- how can I get access to specific values in the groupby clause? in other words, how can I replace every NaN with its correct average?
Thanks a lot.
Upvotes: 0
Views: 1135
Reputation: 323306
Using combine_first
with groupby
mean
df.combine_first(df.groupby('country').transform('mean'))
Or
df.fillna(df.groupby('country').transform('mean'))
Upvotes: 2