Reputation: 39
I want to remove some rows contain outliers, but I have another constraint that is I can only remove this row if the value of this row in column A is not 'Move-UP','MOVE-DOWN'. (The outlier removal rule is mean +/- 3 * standard deviation for column B).
the dataset looks like this (There are a lot more rows in the real dataset):
A B
1 OK 0.34
2 OK 0.587
3 MOVE-UP 1.8
4 OK -2.3
5 MOVE-DOWN 0.4
6 OK 0.35
Let's assume the second row is an outlier and it's Ok to remove it since the value in A is not 'Move-UP','MOVE-DOWN', but if the third row is an outlier I cannot remove it since the value in A is MOVE-UP.
simply speaking, I need to remove outliers from column B but there is a constraint: never touch those row that has a value of 'MOVE-UP' and 'MOVE-DOWN' in column A.
Can someone help me out here?
Upvotes: 1
Views: 90
Reputation: 863481
I believe you need to filter out rows that do not contain values Move-UP
and MOVE-DOWN
in column A
. And also if outliers are defined by list chaining it with another condition with isin
:
L = [0.587, 1.8]
df1 = df[~df['A'].isin(['Move-UP','MOVE-DOWN']) & df['B'].isin([L])]
Upvotes: 1