007
007

Reputation: 23

filter rows to keep only until last occurence of a condition

I have a dataframe after groupby based on an id.

import pandas as pd
df = pd.DataFrame({'date': ['2022/01/01', '2022/01/02', '2022/01/03', '2022/01/04', '2022/01/05', '2022/01/06', '2022/01/07', '2022/01/08', '2022/01/09', '2022/01/10', '2022/01/11', '2022/01/12', '2022/01/13', '2022/01/14', '2022/01/15', '2022/01/16', '2022/01/17', '2022/01/18', '2022/01/19', '2022/01/20', '2022/01/21', '2022/01/22', '2022/01/23', '2022/01/24', '2022/01/25', '2022/01/26', '2022/01/27', '2022/01/28', '2022/01/29', '2022/01/30', '2022/01/31'],
                   'value': [1, 3, 5, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})

in the above dataframe, after 2022/01/06 values are zero. so I want to identify such date after which values remain zero and drop them. expected output dataframe is

>>> df
         date  value
 0  2022/01/01  1
 1  2022/01/02  3
 2  2022/01/03  5
 3  2022/01/04  0
 4  2022/01/05  0
 5  2022/01/06  2

so in the example, I should be able to identify that 2022/01/06 is cancel date as after which values for all dates are 0s.

Upvotes: 1

Views: 387

Answers (2)

jezrael
jezrael

Reputation: 862681

Use cumulative sum in inverted Series by Series.iloc with Series.cumsum and compare for not equal 0 - so it remove all last 0 values per column value in boolean indexing:

df = df[df['value'].iloc[::-1].cumsum().iloc[::-1].ne(0)]
print (df)
         date  value
0  2022/01/01      1
1  2022/01/02      3
2  2022/01/03      5
3  2022/01/04      0
4  2022/01/05      0
5  2022/01/06      2

With drop is it possible:

df.drop(df[df['value'].iloc[::-1].cumsum().eq(0).iloc[::-1]].index)

EDIT: Solution per groups with GroupBy.cumsum:

df = df[df.iloc[::-1].groupby('ID')['value'].cumsum().iloc[::-1].ne(0)]

Upvotes: 2

mozway
mozway

Reputation: 260725

Just thinking of this solution now, using last_valid_index on the masked column:

df.loc[:df['value'].mask(df['value'].eq(0)).last_valid_index()]

output:

         date  value
0  2022/01/01      1
1  2022/01/02      3
2  2022/01/03      5
3  2022/01/04      0
4  2022/01/05      0
5  2022/01/06      2

Upvotes: 0

Related Questions