Thomas
Thomas

Reputation: 12107

trim last rows of a pandas dataframe based on a condition

let's assume a dataframe like this:

idx  x  y
0    a  3
1    b  2
2    c  0
3    d  2
4    e  5

how can I trim the bottom rows, based on a condition, so that any row after the last one matching the condition would be removed?

for example:

with the following condition: y == 0

the output would be

idx  x  y
0    a  3
1    b  2
2    c  0

the condition can happen many times, but the last one is the one that triggers the cut.

Upvotes: 2

Views: 4220

Answers (4)

Erfan
Erfan

Reputation: 42916

Method 1:

Usng index.max & iloc:

  • index.max to get the last row with condition y==0
  • iloc to slice of the dataframe on the index found with df['y'].eq(0)
idx = df.query('y.eq(0)').index.max()+1 
# idx = df.query('y==0').index.max()+1 -- if pandas < 0.25 

df.iloc[:idx]

Output

   x  y
0  a  3
1  b  2
2  c  0

Method 2:

Using np.where

idx = np.where(df['y'].eq(0), df.index, 0).max()+1
df.iloc[:idx]

Output

   x  y
0  a  3
1  b  2
2  c  0

Upvotes: 4

Ayoub ZAROU
Ayoub ZAROU

Reputation: 2417

you could do, here np.wherereturns a tuple, so we access the value of the indexes as the first element of the tuple using np.where(df.y == 0), the first occurence is then returned as the last element of this vector, finaly we add 1 to the index so we can include this index of the last occurence while slicing

df_cond = df.iloc[:np.where(df.y == 0)[0][-1]+1, :]

or you could do :

df_cond  = df[ :df.y.eq(0).cumsum().idxmax()+1 ]

Upvotes: 1

Ben Pap
Ben Pap

Reputation: 2579

I would do something like this:

df.iloc[:df['y'].eq(0).idxmax()+1]

Just look for the largest index where your condition is true.

EDIT

So the above code wont work because idxmax() still only takes the first index where the value is true. So we we can do the following to trick it:

df.iloc[:df['y'].eq(0).sort_index(ascending = False).idxmax()+1]

Flip the index, so the last index is the first index that idxmax picks up.

Upvotes: 0

Andy
Andy

Reputation: 53

Set up your dataframe:

data = [
    [ 'a',  3],
[ 'b' , 2],
[  'c' , 0],
[  'd',  2],
[ 'e' , 5]
]
df = pd.DataFrame(data, columns=['x',  'y']).reset_index().rename(columns={'index':'idx'}).sort_values('idx')

Then find your cutoff (assuming the idx column is already sorted):

cutoff = df[df['y'] == 0].idx.min()

The df['y'] == 0 is your condition. Then get the min idx that meets that condition and save it as our cutoff.

Finally, create a new dataframe using your cutoff:

df_new = df[df.idx <= cutoff].copy()

Output:

df_new

   idx  x   y
0   0   a   3
1   1   b   2
2   2   c   0

Upvotes: 0

Related Questions