Reputation: 41
I want to do a slice a dataframe in pandas like this:
index a b
0 A -
1 A +
2 A -
3 B -
4 C +
5 C -
I want to keep all the rows after the first '+', grouped by column A, and delete all the rows in each group starting with '-'. The outcome should be like this:
index a b
1 A +
2 A -
4 C +
5 C -
How to do this?
Upvotes: 1
Views: 99
Reputation: 862611
Use GroupBy.cummax
with compare b
for +
for keep all rows after first +
per groups:
df1 = (df[df.assign(new = lambda x: x['b'].eq('+'))
.groupby('a')['new']
.cummax()])
print (df1)
a b
1 A +
2 A -
4 C +
5 C -
Upvotes: 1
Reputation: 260600
Simple syntax, use groupby
on Series with cummax
:
df[df['b'].eq('+').groupby(df['a']).cummax()]
output:
index a b
1 1 A +
2 2 A -
4 4 C +
5 5 C -
If you also want to delete groups that start with -
("delete all the rows in each group starting with '-'"), you can combine cummin
/cummax
:
df[df['b'].ne('-').groupby(df['a']).apply(lambda s: s.cummin().cummax())]
output:
index a b
4 4 C +
5 5 C -
Upvotes: 1