Reputation: 35
I have got a dataframe of game releases and ratings
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
I want to fill NaN values in user_score column with the mean of the same genre. If a game has sports genre and in that row user_score is NaN i want replace the null value with sport's average user rating.
Upvotes: 2
Views: 100
Reputation: 16147
This data has had the user_score of the second sports game removed so that we can demonstrate the code.
name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,,E
Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.0,,,
Looking at the user score of the Wii Sports Resort
df.iloc[3]['user_score']
nan
Replacing NaN with the mean of the user_score by genre
df['user_score'] = df.groupby('genre')['user_score'].transform(lambda x: x.fillna(x.mean()))
Checking the output of the same game after the update
df.iloc[3]['user_score']
8.0
Upvotes: 1
Reputation: 9247
One possible solution is to create a dictionary genre_avg
of genre average ratings and then substitute NAs in user_score
according to this dictionary
genre_avg = data.groupby(['genre']).agg({'user_score': 'mean'})['user_score'].to_dict()
data['user_score'] = data['user_score'].fillna(data['genre'].map(genre_avg))
In your small sample data nothing changes, because none of the NaNs
have any other values to average. However, if for instance you change the genre
of Wii Sports
from Sports
to Platform
, you will see that Super Mario Bros.
will have its user_score
filled with the average of the Platform
genre games.
Upvotes: 1