Reputation: 271
df = pd.DataFrame({"name":["A", "A","A", "A", "B" ,"B","B" ,"B", "C", "C","C", "C"],
"nickname":["X","Y","X","Z","X","Y","X","Y","Y", "X","Y", "Y"]})
How can I group df by "name" and drop rows in each group where 'X' is immediately followed by 'Y'? i.e. 'X' should be deleted in that case.
Required Output:
1 A Y
2 A X
3 A Z
5 B Y
7 B Y
8 C Y
10 C Y
11 C Y
Upvotes: 3
Views: 246
Reputation: 71580
Use df[...]
for this, filter out the ones that are not needed, grouping is not really needed:
print(df[(df['nickname']!='X') | (df['nickname'].shift(-1)!='Y')])
Output:
name nickname
1 A Y
2 A X
3 A Z
5 B Y
7 B Y
8 C Y
10 C Y
11 C Y
Update:
print(df[(df['nickname']!='X') | (df['nickname'].shift(-1).isin(['Y','Z'])==0)])
Upvotes: 2
Reputation: 862681
Use DataFrameGroupBy.shift
for shifting per groups and compare by ne
for !=
, chain with bitwise OR
- |
and filter by boolean indexing
:
m = df.groupby('name')['nickname'].shift(-1).ne('Y') | df['nickname'].ne('X')
df = df[m]
print (df)
name nickname
1 A Y
2 A X
3 A Z
5 B Y
7 B Y
8 C Y
10 C Y
11 C Y
EDIT:
df = pd.DataFrame({"name":["A", "A","A", "A", "B" ,"B","B" ,"B", "C", "C","C", "C"],
"nickname":["X","Y","X","Z","X","Y","X","X","Y", "X","Y", "Y"]})
print (df)
name nickname
0 A X
1 A Y
2 A X
3 A Z
4 B X
5 B Y
6 B X
7 B X
8 C Y
9 C X
10 C Y
11 C Y
m = df.groupby('name')['nickname'].shift(-1).ne('Y') | df['nickname'].ne('X')
df1 = df[m]
print (df1)
name nickname
1 A Y
2 A X
3 A Z
5 B Y
6 B X
7 B X
8 C Y
10 C Y
11 C Y
print(df[(df['nickname']!='X') | (df['nickname'].shift(-1)!='Y')])
name nickname
1 A Y
2 A X
3 A Z
5 B Y
6 B X
8 C Y
10 C Y
11 C Y
Upvotes: 3