G1124E
G1124E

Reputation: 427

Conditionally filter rows based on mean of a category in pandas

I have a dataframe and I want to filter out values of the table based on a grouped column mean.

Eg.

df = pd.DataFrame({'Year':[2020, 2021, 2020, 2021],
                     'Cars': ['Bentley', 'Toyota',
                           'Aston Martin', 'Nissan'],
                    'Max Speed': [380, 370, 275, 350]})
Year Cars Max Speed
2020 Bentley 380
2021 Toyota 370
2020 Aston Martin 275
2021 Nissan 350

I want to filter rows that have greater than the mean speed of each group.

Here the mean of each group is:

df.groupby(['Year']).mean().reset_index()

The output of the mean is:

Year Max Speed
2020 327.5
2021 360.0

Eg. therefore, I want to filter the rows in the original table for values greater than the mean of each year.

The expected output will look like:

Year Cars Max Speed
2020 Bentley 380
2021 Toyota 370

The above table shows the desired output. Both Bentley and Toyota have greater max speed for the mean of the respective years.

I am not sure how to access the values with respect to the mean of each group.

Help is much appreciated.

Upvotes: 1

Views: 328

Answers (1)

akuiper
akuiper

Reputation: 215107

You can use transform to directly create the condition that aligns with the original data frame:

df[df['Max Speed'].groupby(df.Year).transform(lambda x: x > x.mean())]

   Year     Cars  Max Speed
0  2020  Bentley        380
1  2021   Toyota        370

Upvotes: 3

Related Questions