Reputation: 427
I have a dataframe and I want to filter out values of the table based on a grouped column mean.
Eg.
df = pd.DataFrame({'Year':[2020, 2021, 2020, 2021],
'Cars': ['Bentley', 'Toyota',
'Aston Martin', 'Nissan'],
'Max Speed': [380, 370, 275, 350]})
Year | Cars | Max Speed |
---|---|---|
2020 | Bentley | 380 |
2021 | Toyota | 370 |
2020 | Aston Martin | 275 |
2021 | Nissan | 350 |
I want to filter rows that have greater than the mean speed of each group.
Here the mean of each group is:
df.groupby(['Year']).mean().reset_index()
The output of the mean is:
Year | Max Speed |
---|---|
2020 | 327.5 |
2021 | 360.0 |
Eg. therefore, I want to filter the rows in the original table for values greater than the mean of each year.
The expected output will look like:
Year | Cars | Max Speed |
---|---|---|
2020 | Bentley | 380 |
2021 | Toyota | 370 |
The above table shows the desired output. Both Bentley
and Toyota
have greater max speed for the mean of the respective years.
I am not sure how to access the values with respect to the mean of each group.
Help is much appreciated.
Upvotes: 1
Views: 328
Reputation: 215107
You can use transform
to directly create the condition that aligns with the original data frame:
df[df['Max Speed'].groupby(df.Year).transform(lambda x: x > x.mean())]
Year Cars Max Speed
0 2020 Bentley 380
1 2021 Toyota 370
Upvotes: 3