Reputation: 319
I have a table df
like this, but longer and with many other type
values.
type | weight |
---|---|
a | 35.1 |
a | 36.7 |
b | 100.2 |
b | 99.3 |
b | 102.0 |
b | 5.0 |
a | 38.2 |
a | 250.8 |
I want to remove from df
all records with outliers using the 95th percentile but broken down into individual values in the type
column.
For a single value of type, I do it like this:
my_perc = 95
temp = df[df['type'] == 'a']
temp[temp.weight < np.percentile(temp.weight, my_perc)]
Now I would like to do this automatically for the whole table df
, taking into account individual groups in the type
column.
I also tried this:
df[df.groupby(['type'])['weight'] < np.percentile(df.weight, my_perc)]
But it doesn't work.
Do you have any idea for this?
Upvotes: 4
Views: 539
Reputation: 18456
You can use DataFrame.groupby
and group the dataframe on type
then calculate the percentile using np.percentile
for weight
column for each of the group to get the required result:
df.groupby('type')['weight'].apply(lambda x:x[x.le(np.percentile(x, 95))]).droplevel(axis=0, level=1).reset_index()
type weight
0 a 35.1
1 a 36.7
2 a 38.2
3 b 100.2
4 b 99.3
5 b 5.0
Upvotes: 0
Reputation: 319
Ok, probably problem solved:
my_perc = 0.95
df[df.groupby('type')['weight'].transform(lambda x : x < x.quantile(my_perc))]
Upvotes: 3