Getting column mean in groupby clause python pandas

Question

I have a dataset of actors and directors and the popularity of the movie that they have worked together in.

print (actors_director_df.head(3))

                 actor         director  popularity counter
0          Chris Pratt  Colin Trevorrow   32.985763       0
1  Bryce Dallas Howard  Colin Trevorrow   32.985763       0
2          Irrfan Khan  Colin Trevorrow   32.985763       0

I want to group by using actor and director because a pair can work in more than one film. which I successfully did use below query.

actor_director_grouped = actors_director_df.groupby(['actor','director']) \
                         .size() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(10)

print (actor_director_grouped)

                      actor            director  count
3619         Clint Eastwood      Clint Eastwood     14
19272           Woody Allen         Woody Allen     12
9606            Johnny Depp          Tim Burton      8

But the popularity column goes missing in this DF.

What I want to do is to do a mean of popularity column after groupby and show the mean in front of actor and director along with the count of the number of movies they did together.

i.e. my ideal output would be something like this.

                      actor            director  popularity count
3619         Clint Eastwood      Clint Eastwood   32.985763    14
19272           Woody Allen         Woody Allen   5.1231231    12
9606            Johnny Depp          Tim Burton   3.1231231    8

Anton vBR · Accepted Answer

Looking at your dataframe the counter columns seems unnecessary. Let us instead use the popularity column and make one mean and one count column:

import pandas as pd
import numpy as np

np.random.seed(444)

names = [
    'Robert Baratheon',
    'Jon Snow',
    'Daenerys Targaryen',
    'Theon Greyjoy',
    'Tyrion Lannister'
]

df = pd.DataFrame({
    'actor': np.random.choice(names, size=10, p = [0.2,0.2,0.2,0.1,0.3]),
    'director': np.random.choice(names, size=10, p = [0.4,0.1,0.1,0.1,0.3]),
    'popularity': np.random.randint(0,100, size=10),
    'counter': 0
})

df2 = df.groupby(['actor','director'])['popularity']\
        .agg(['count', 'mean'])\
        .reset_index()\
        .sort_values(by='mean', ascending=False)

print(df2)

Returns:

              actor          director  count  mean
0          Jon Snow  Robert Baratheon      2  53.5
5  Tyrion Lannister  Tyrion Lannister      2  49.0
2  Robert Baratheon  Tyrion Lannister      2  48.5
1  Robert Baratheon          Jon Snow      2  40.5
4     Theon Greyjoy  Tyrion Lannister      1  13.0
3     Theon Greyjoy  Robert Baratheon      1   7.0

Getting column mean in groupby clause python pandas

Answers (2)

Related Questions