Michael Dorner
Michael Dorner

Reputation: 20175

Count most recent zeros in pandas data frame

date_0 = list(pd.date_range('2017-01-01', periods=6, freq='MS'))
date_1 = list(pd.date_range('2017-01-01', periods=8, freq='MS'))
data_0 = [9, 8, 4, 0, 0, 0]
data_1 = [9, 9, 0, 0, 0, 7, 0, 0]
id_0 = [0]*6
id_1 = [1]*8
df = pd.DataFrame({'ids': id_0 + id_1, 'dates': date_0 + date_1, 'data': data_0 + data_1})

For each id (here 0 and 1) I want to know how long is the series of zeros at the end of the time frame.

For the given example, the result is id_0 = 3, id_1 = 2.

So how do I limit the timestamps, so I can run something like that:

df.groupby('ids').agg('count')

Upvotes: 1

Views: 29

Answers (1)

jezrael
jezrael

Reputation: 863226

First need get all consecutive 0 with trick by compare with shifted values for not equal and cumsum.

Then count pre groups, remove first level of MultiIndex and get last values per group by drop_duplicates with keep='last':

s = df['data'].ne(df['data'].shift()).cumsum().mul(~df['data'].astype(bool))
df = (s.groupby([df['ids'], s]).size()
       .reset_index(level=1, drop=True)
       .reset_index(name='val')
       .drop_duplicates('ids', keep='last'))
print (df)
   ids  val
1    0    3
4    1    2

Upvotes: 1

Related Questions