Reputation: 23
index [0, 1, 2, 3, 4, 5]
part_1 [4, 5, 6, 4, 8, 4]
part_2 [11, 12, 10, 12, 14, 13]
new [6, 4, 8, 8, na, na]
I'm a beginner in python & pandas asking for support. In a simple dataframe, I want to create a new column that gives me the last row of a cumulative sum that satisfies the condition
df.part_1.cumsum() > df.part_2
So e.g. for the new column at index 0 I would get the value 6 as (4+5+6) > 11.
Thanks!
Upvotes: 2
Views: 1109
Reputation: 88246
IIUC here a NumPy
based approach. The idea is to build an upper triangular matrix, with shifted versions of the input array in each row. By taking the cumulative sum of these, and comparing against the second column of the dataframe, we can find using argmax
the first index where a value in the cumulative sequences is greater than the third dataframe column in the corresponding index:
a = df.to_numpy()
cs = np.triu(a[:,1]).cumsum(1)
ix = (cs >= a[:,2,None]).argmax(1)
# array([2, 3, 3, 4, 6, 7, 7, 0], dtype=int64)
df['first_ix'] = a[ix,1,None]
print(df)
index part_1 part_2 first_ix
0 0 4 11 6
1 1 5 12 4
2 2 6 10 4
3 3 4 12 8
4 4 8 14 6
5 5 4 13 8
6 6 6 11 8
7 7 8 10 4
Upvotes: 1