qwerty
qwerty

Reputation: 887

Filter rows with the same value in a specific variable for each ID - Pandas

I want to exclude rows with the same value in a specific binary variable ("Y"), for each ID in the data frame. It means that if a ID got the same values (only 0 or only 1) in Y, then it should be excluded.

Data illustration:

ID  X   Y
a   ..  0
a   ..  0
a   ..  0
b   ..  1
b   ..  0
b   ..  1
b   ..  0
c   ..  1
c   ..  1
c   ..  1
c   ..  1

Expected result:

ID  X   Y
b   ..  1
b   ..  0
b   ..  1
b   ..  0

Upvotes: 2

Views: 129

Answers (2)

BENY
BENY

Reputation: 323266

Since you mentioned filter

df.groupby('ID').filter(lambda x : x['Y'].nunique()>1)

Upvotes: 4

anky
anky

Reputation: 75080

Use groupby() on ID and transform as nunique , then filter rows with results greater than 1:

df[df.groupby('ID')['Y'].transform('nunique')>1]

  ID   X  Y
3  b  ..  1
4  b  ..  0
5  b  ..  1
6  b  ..  0

Upvotes: 6

Related Questions