Reputation: 23567
I have a database with 2 columns.
import pandas as pd
data = pd.DataFrame({'a':[1,2,1,4,1,1,3,1,4,1,1,1],'b':[5,2,8,3,10,3,5,15,45,41,23,9]})
a b
0 1 5
1 2 2
2 1 8
3 4 3
4 1 10
5 1 3
6 3 5
7 1 15
8 4 45
9 1 41
10 1 23
11 1 9
Is there a pythonic/fastest way to pick out the row indices whenever the cumulative value since the last occurrence exceeds a given threshold for column a? for example, in the above df, if my threshold is like 5, I would get indices 3,6,8.
The way I'm currently doing it is loop through every row and then keep track of when values exceed it. I am not enough of a python expert to come up with a potentially (if it exist) better way..
thanks
Upvotes: 3
Views: 166
Reputation: 92854
Until someone invented some pandas
one-liner (if possible), you could try the following approach:
From IPython session:
In [393]: get_a_cumsum_lim = lambda df, col, threshold: df[col][df[col].cumsum() >= threshold]
In [394]: s, result = get_a_cumsum_lim(data, 'a', 5), []
In [395]: while not s.empty:
...: idx = s.index[0]
...: result.append(idx)
...: s = get_a_cumsum_lim(data[idx+1:], 'a', 5)
...:
...:
In [396]: result
Out[396]: [3, 6, 8]
Upvotes: 1