kayak
kayak

Reputation: 57

Extract mean value for each column in Pandas

I have a dataframe1 that shows the audience's rating and the genre of each movie:

movie_id| rating | action | comedy | drama
0         4        1        1        1
1         5        0        1        0
2         3        0        1        1

1 for action means it is an action movie, and 0 means it is not.

I extracted the average rating for a single genre. Action for example, I did this:

new=df1[df1["action"]==1]
new['rating'].mean()

which shows 4. But Now I have to extract average rating for all genres which should look like this:

action | comedy | drama
4        4        3.5

Any advice on how to approach?

Upvotes: 3

Views: 1005

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

Multiply the rating column with action, comedy and drama columns, replace 0 with np.nan, and compute the mean:

(df.iloc[:, 2:]
   .mul(df.rating, axis = 0)
   # mean implicitly excludes nulls during computations
   .replace(0, np.nan) 
   .mean()
)
action    4.0
comedy    4.0
drama     3.5
dtype: float64

The above returns a Series, if you want a dataframe like output, pass mean to agg:

(df.iloc[:, 2:]
   .mul(df.rating, axis = 0)
   .replace(0, np.nan) 
   .agg(['mean']) # note the `mean` is in a list
)

      action  comedy  drama
mean     4.0     4.0    3.5

Upvotes: 1

user1717828
user1717828

Reputation: 7225

You can melt the genre columns and filter to only keep values equal to 1. Then group by the genres and calculate the mean.

pd.melt(
    df,
    value_vars=["action", "comedy", "drama"],
    var_name="genre",
    id_vars=["movie_id", "rating"],
).query("value == 1").groupby("genre")["rating"].mean()

which gives

genre
action    4.0
comedy    4.0
drama     3.5
Name: rating, dtype: float64

Upvotes: 2

BENY
BENY

Reputation: 323226

In your case we can select the columns then where all 0 to NaN and mul with the rating

out = df.loc[:,['action','comedy','drama']].where(lambda x : x==1).mul(df.rating,axis=0).mean()
Out[377]: 
action    4.0
comedy    4.0
drama     3.5
dtype: float64

If you would like a dataframe

out = out.to_frame().T

Upvotes: 4

Related Questions