Ankita Patnaik
Ankita Patnaik

Reputation: 271

Pandas: Remove group rows where a particular value for that group is followed by another value

df = pd.DataFrame({"name":["A", "A","A", "A", "B" ,"B","B" ,"B", "C", "C","C", "C"],
                   "nickname":["X","Y","X","Z","X","Y","X","Y","Y", "X","Y", "Y"]})

How can I group df by "name" and drop rows in each group where 'X' is immediately followed by 'Y'? i.e. 'X' should be deleted in that case.

Required Output:

1     A        Y
2     A        X
3     A        Z
5     B        Y
7     B        Y
8     C        Y
10    C        Y
11    C        Y

Upvotes: 3

Views: 246

Answers (2)

U13-Forward
U13-Forward

Reputation: 71580

Use df[...] for this, filter out the ones that are not needed, grouping is not really needed:

print(df[(df['nickname']!='X') | (df['nickname'].shift(-1)!='Y')])

Output:

   name nickname
1     A        Y
2     A        X
3     A        Z
5     B        Y
7     B        Y
8     C        Y
10    C        Y
11    C        Y

Update:

print(df[(df['nickname']!='X') | (df['nickname'].shift(-1).isin(['Y','Z'])==0)])

Upvotes: 2

jezrael
jezrael

Reputation: 862681

Use DataFrameGroupBy.shift for shifting per groups and compare by ne for !=, chain with bitwise OR - | and filter by boolean indexing:

m = df.groupby('name')['nickname'].shift(-1).ne('Y') | df['nickname'].ne('X')

df = df[m]
print (df)
   name nickname
1     A        Y
2     A        X
3     A        Z
5     B        Y
7     B        Y
8     C        Y
10    C        Y
11    C        Y

EDIT:

df = pd.DataFrame({"name":["A", "A","A", "A", "B" ,"B","B" ,"B", "C", "C","C", "C"],
                   "nickname":["X","Y","X","Z","X","Y","X","X","Y", "X","Y", "Y"]})

print (df)
   name nickname
0     A        X
1     A        Y
2     A        X
3     A        Z
4     B        X
5     B        Y
6     B        X
7     B        X
8     C        Y
9     C        X
10    C        Y
11    C        Y

m = df.groupby('name')['nickname'].shift(-1).ne('Y') | df['nickname'].ne('X')

df1 = df[m]
print (df1)
   name nickname
1     A        Y
2     A        X
3     A        Z
5     B        Y
6     B        X
7     B        X
8     C        Y
10    C        Y
11    C        Y

print(df[(df['nickname']!='X') | (df['nickname'].shift(-1)!='Y')])
   name nickname
1     A        Y
2     A        X
3     A        Z
5     B        Y
6     B        X
8     C        Y
10    C        Y
11    C        Y

Upvotes: 3

Related Questions