Reputation: 57
I have a dataframe1 that shows the audience's rating and the genre of each movie:
movie_id| rating | action | comedy | drama
0 4 1 1 1
1 5 0 1 0
2 3 0 1 1
1 for action means it is an action movie, and 0 means it is not.
I extracted the average rating for a single genre. Action for example, I did this:
new=df1[df1["action"]==1]
new['rating'].mean()
which shows 4. But Now I have to extract average rating for all genres which should look like this:
action | comedy | drama
4 4 3.5
Any advice on how to approach?
Upvotes: 3
Views: 1005
Reputation: 28644
Multiply the rating
column with action
, comedy
and drama
columns, replace 0 with np.nan, and compute the mean:
(df.iloc[:, 2:]
.mul(df.rating, axis = 0)
# mean implicitly excludes nulls during computations
.replace(0, np.nan)
.mean()
)
action 4.0
comedy 4.0
drama 3.5
dtype: float64
The above returns a Series, if you want a dataframe like output, pass mean
to agg:
(df.iloc[:, 2:]
.mul(df.rating, axis = 0)
.replace(0, np.nan)
.agg(['mean']) # note the `mean` is in a list
)
action comedy drama
mean 4.0 4.0 3.5
Upvotes: 1
Reputation: 7225
You can melt the genre columns and filter to only keep values equal to 1. Then group by the genres and calculate the mean.
pd.melt(
df,
value_vars=["action", "comedy", "drama"],
var_name="genre",
id_vars=["movie_id", "rating"],
).query("value == 1").groupby("genre")["rating"].mean()
which gives
genre
action 4.0
comedy 4.0
drama 3.5
Name: rating, dtype: float64
Upvotes: 2
Reputation: 323226
In your case we can select the columns then where
all 0 to NaN
and mul
with the rating
out = df.loc[:,['action','comedy','drama']].where(lambda x : x==1).mul(df.rating,axis=0).mean()
Out[377]:
action 4.0
comedy 4.0
drama 3.5
dtype: float64
If you would like a dataframe
out = out.to_frame().T
Upvotes: 4