Beginner23
Beginner23

Reputation: 23

In pandas dataframe - returning last value of cumulative sum that satisfies condition

index [0, 1, 2, 3, 4, 5]
part_1 [4, 5, 6, 4, 8, 4]
part_2 [11, 12, 10, 12, 14, 13]
new [6, 4, 8, 8, na, na]

I'm a beginner in python & pandas asking for support. In a simple dataframe, I want to create a new column that gives me the last row of a cumulative sum that satisfies the condition

df.part_1.cumsum() > df.part_2

So e.g. for the new column at index 0 I would get the value 6 as (4+5+6) > 11.

Thanks!

Upvotes: 2

Views: 1109

Answers (1)

yatu
yatu

Reputation: 88246

IIUC here a NumPy based approach. The idea is to build an upper triangular matrix, with shifted versions of the input array in each row. By taking the cumulative sum of these, and comparing against the second column of the dataframe, we can find using argmax the first index where a value in the cumulative sequences is greater than the third dataframe column in the corresponding index:

a = df.to_numpy()
cs = np.triu(a[:,1]).cumsum(1)
ix = (cs >= a[:,2,None]).argmax(1)
# array([2, 3, 3, 4, 6, 7, 7, 0], dtype=int64)
df['first_ix'] = a[ix,1,None]

print(df)
   index  part_1  part_2  first_ix
0      0       4      11         6
1      1       5      12         4
2      2       6      10         4
3      3       4      12         8
4      4       8      14         6
5      5       4      13         8
6      6       6      11         8
7      7       8      10         4

Upvotes: 1

Related Questions