Reputation: 86775
I have a dataframe of several thousand timeseries.
For each timeseries, there is a unique timestamp, so I can enforce the order.
Each timeseries has a state column
I need to 'discard' all rows before there have been a certain number of consecutive 1's
So, here's some sample data...
test = pd.DataFrame({
'group': [1,1,1,1,1,1,1, 2,2,2,2,2,2,2],
'idx': [0,1,2,3,4,5,6, 0,1,2,3,4,5,6],
'value': [0,1,0,1,1,1,1, 0,1,1,1,0,1,0],
})
The results that I want are...
desired_result = pd.DataFrame({
'group': [ 1,1,1, 2,2,2,2,2],
'idx': [ 4,5,6, 2,3,4,5,6],
'value': [ 1,1,1, 1,1,0,1,0],
})
What I think I need to calculate is...
test = pd.DataFrame({
'group': [1,1,1,1,1,1,1, 2,2,2,2,2,2,2],
'idx': [0,1,2,3,4,5,6, 0,1,2,3,4,5,6],
'value': [0,1,0,1,1,1,1, 0,1,1,1,0,1,0],
#'consec':[0,1,0,1,2,3,4, 0,1,2,3,0,1,0], -- the cumulative sum of value, but resetting whenever a 0 is encountered
#'max_c': [0,1,1,1,2,3,4, 0,1,2,3,3,3,3], -- the cumulative max of consec
# ^ ^ ^ ^ ^ ^ ^ ^ -- rows I want to keep, as max_c >= 2
})
Then I can just take the rows where test[ test['max_c'] >= 2 ]
But, how do I calculate consec
?
value
, resetting at 0's, independently by group
?EDIT: My best attempt, but it feels ridiculously long winded...
test['cumsum'] = test.groupby(['group'])['value'].cumsum()
test['reset'] = test['cumsum'][ test.groupby(['group'])['value'].diff() == -1 ]
test['reset'] = test['reset'].fillna(0)
test['reset_cummax'] = test.groupby(['group'])['reset'].cummax()
test['consec'] = test['cumsum'] - test['reset_cummax']
test['c_max'] = test.groupby(['group'])['consec'].cummax()
Upvotes: 0
Views: 594
Reputation: 29635
IIUC, you can do the cumsum
after a groupby
on the column group and every time the column value is eq
to 0, you create a new group with cumsum
as well.
test['consec'] = test.groupby(['group', test['value'].eq(0).cumsum()])['value'].cumsum()
test['max_c'] = test.groupby(['group'])['consec'].cummax()
print(test)
group idx value consec max_c
0 1 0 0 0 0
1 1 1 1 1 1
2 1 2 0 0 1
3 1 3 1 1 1
4 1 4 1 2 2
5 1 5 1 3 3
6 1 6 1 4 4
7 2 0 0 0 0
8 2 1 1 1 1
9 2 2 1 2 2
10 2 3 1 3 3
11 2 4 0 0 3
12 2 5 1 1 3
13 2 6 0 0 3
Upvotes: 2