Reputation: 53
I have a df which looks like the following:
Group. | Score. |
---|---|
red | 34 |
blue | 42 |
green | 1000 |
green | 34 |
blue | 34 |
red | 42 |
I would like to add a column onto this which specifies if the value is an outlier. If there were no groups then I would use something like:
df['outliers'] = df[df[col] > df[col].mean() + 3 * df[col].std()]
But how would I do this so it is within the groups?
Upvotes: 1
Views: 1030
Reputation: 18306
You can use GroupBy.transform
:
df["is_outlier"] = df.groupby("Group.").transform(lambda x: (x - x.mean()).abs() > 3*x.std())
In each group, we take the distance of elements from the group mean and see if its absolute value exceeds 3 times std of the group.
Upvotes: 3