Leran
Leran

Reputation: 39

data frame remove rows based on values of two columns

I want to remove some rows contain outliers, but I have another constraint that is I can only remove this row if the value of this row in column A is not 'Move-UP','MOVE-DOWN'. (The outlier removal rule is mean +/- 3 * standard deviation for column B).

the dataset looks like this (There are a lot more rows in the real dataset):

    A                B
1  OK              0.34
2  OK              0.587
3  MOVE-UP         1.8
4  OK              -2.3
5  MOVE-DOWN       0.4
6  OK              0.35

Let's assume the second row is an outlier and it's Ok to remove it since the value in A is not 'Move-UP','MOVE-DOWN', but if the third row is an outlier I cannot remove it since the value in A is MOVE-UP.

simply speaking, I need to remove outliers from column B but there is a constraint: never touch those row that has a value of 'MOVE-UP' and 'MOVE-DOWN' in column A.

Can someone help me out here?

Upvotes: 1

Views: 90

Answers (1)

jezrael
jezrael

Reputation: 863481

I believe you need to filter out rows that do not contain values Move-UP and MOVE-DOWN in column A. And also if outliers are defined by list chaining it with another condition with isin:

L = [0.587, 1.8]
df1 = df[~df['A'].isin(['Move-UP','MOVE-DOWN']) & df['B'].isin([L])]

Upvotes: 1

Related Questions