Reputation: 12107
let's assume a dataframe like this:
idx x y
0 a 3
1 b 2
2 c 0
3 d 2
4 e 5
how can I trim the bottom rows, based on a condition, so that any row after the last one matching the condition would be removed?
for example:
with the following condition: y == 0
the output would be
idx x y
0 a 3
1 b 2
2 c 0
the condition can happen many times, but the last one is the one that triggers the cut.
Upvotes: 2
Views: 4220
Reputation: 42916
Usng index.max
& iloc
:
index.max
to get the last row with condition y==0
iloc
to slice of the dataframe on the index found with df['y'].eq(0)
idx = df.query('y.eq(0)').index.max()+1
# idx = df.query('y==0').index.max()+1 -- if pandas < 0.25
df.iloc[:idx]
Output
x y
0 a 3
1 b 2
2 c 0
Using np.where
idx = np.where(df['y'].eq(0), df.index, 0).max()+1
df.iloc[:idx]
Output
x y
0 a 3
1 b 2
2 c 0
Upvotes: 4
Reputation: 2417
you could do, here np.where
returns a tuple, so we access the value of the indexes as the first element of the tuple using np.where(df.y == 0)
, the first occurence is then returned as the last element of this vector, finaly we add 1 to the index so we can include this index of the last occurence while slicing
df_cond = df.iloc[:np.where(df.y == 0)[0][-1]+1, :]
or you could do :
df_cond = df[ :df.y.eq(0).cumsum().idxmax()+1 ]
Upvotes: 1
Reputation: 2579
I would do something like this:
df.iloc[:df['y'].eq(0).idxmax()+1]
Just look for the largest index where your condition is true.
EDIT
So the above code wont work because idxmax() still only takes the first index where the value is true. So we we can do the following to trick it:
df.iloc[:df['y'].eq(0).sort_index(ascending = False).idxmax()+1]
Flip the index, so the last index is the first index that idxmax picks up.
Upvotes: 0
Reputation: 53
Set up your dataframe:
data = [
[ 'a', 3],
[ 'b' , 2],
[ 'c' , 0],
[ 'd', 2],
[ 'e' , 5]
]
df = pd.DataFrame(data, columns=['x', 'y']).reset_index().rename(columns={'index':'idx'}).sort_values('idx')
Then find your cutoff (assuming the idx column is already sorted):
cutoff = df[df['y'] == 0].idx.min()
The df['y'] == 0 is your condition. Then get the min idx that meets that condition and save it as our cutoff.
Finally, create a new dataframe using your cutoff:
df_new = df[df.idx <= cutoff].copy()
Output:
df_new
idx x y
0 0 a 3
1 1 b 2
2 2 c 0
Upvotes: 0