Reputation:
I'd like to output a dataframe showing the mean rating (based on 'AverageRating') of each of the listed genres.
Here's a mutilated version of the table as an example:
MovieID Movie_title Action Adventure Animation AverageRating
1 Toy Story (1995) 0 0 1 3.878319
2 GoldenEye (1995) 1 1 0 3.206107
3 Four Rooms (1995) 0 0 0 3.033333
4 Get Shorty (1995) 1 0 0 3.550239
5 Copycat (1995) 0 0 0 3.302326
6 Shanghai Triad 0 0 0 3.576923
7 Twelve Monke(1995) 0 0 0 3.798469
8 Babe (1995) 0 0 0 3.995434
9 Dead Man W (1995) 0 0 0 3.896321
10 Richard III (1995) 0 0 0 3.831461
11 Seven (1995) 0 0 0 3.847458
12 Usual Suspec (1995) 0 0 0 4.385768
13 Mighty Aphro (1995) 0 0 0 3.418478
14 Postino, Il (1994) 0 0 0 3.967213
15 Mr. Holland's(1995) 0 0 0 3.778157
16 French Twist (1995) 0 0 0 3.205128
17 From Dusk Till 1 0 0 3.119565
18 White Balloon 0 0 0 2.800000
19 Antonia's Line 0 0 0 3.956522
20 Angels and Insects 0 0 0 3.416667
21 Muppet Treasure 1 1 0 2.761905
22 Braveheart (1995) 1 0 0 4.151515
23 Taxi Driver (1976) 0 0 0 4.120879
24 Rumble in the 1 1 0 3.448276
25 Birdcage 0 0 0 3.443686
So, I need to take the movies where Action = 1 etc, calculate the mean of AverageRating for those films and then create a dataframe, such as:
AverageRating
Action 2.97
Adventure 3.14
Animation 3.30
Using pd.groupby, it's simple to do one column:
df.groupby(['Action'])['AverageRating'].mean()
Action
0 3.095288
1 2.966332
Name: AverageRating, dtype: float64
...but I'm struggling to work out how to do several columns at once in the way that is needed. I know there has to be a simple way to do it that I'm missing. Any assistance greatly appreciated!
Upvotes: 3
Views: 731
Reputation: 150735
IIUC, you can multiply the category with AverageRating
and then average on mask:
cats = df.iloc[:, 2:-1]
s = cats.mul(df.AverageRating, axis='rows')
s.mask(s.eq(0)).mean()
# per comment, this is a better option
# s.mask(cats.eq(0)).mean()
Output:
Action 3.372934
Adventure 3.138763
Animation 3.878319
dtype: float64
Upvotes: 3