Horst-Jackson
Horst-Jackson

Reputation: 163

Pandas: Cumulative sum within group with two conditions

I have a DataFrame that looks like this table:

index x y value_1 cumsum_1 cumsum_2
0 0.1 1 12 12 0
1 1.2 1 10 12 10
2 0.25 1 7 19 10
3 1.0 2 3 0 3
4 0.72 2 5 5 10
5 1.5 2 10 5 13

So my aim is to calculate the cumulative sum of value_1. But there are two conditions that must be taken into account.

Does somebody have any idea?

Upvotes: 1

Views: 3379

Answers (3)

SeaBean
SeaBean

Reputation: 23217

You can use .where() on conditions x < 1 or x >= 1 to temporarily modify the values of value_1 to 0 according to the condition and then groupby cumsum, as follows:

The second condition is catered by the .groupby function while the first condition is catered by the .where() function, detailed below:

.where() keeps the column values when the condition is true and change the values (to 0 in this case) when the condition is false. Thus, for the first condition where column x < 1, value_1 will keep its values for feeding to the subsequent cumsum step to accumulate the filtered values of value_1. For rows where the condition x < 1 is False, value_1 has its values masked to 0. These 0 passed to cumsum for accumulation is effectively the same effect as taking out the original values of value_1 for the accumulation into column cumsum_1.

The second line of codes accumulates value_1 values to column cumsum_2 with the opposite condition of x >= 1. These 2 lines of codes, in effect, allocate value_1 to cumsum_1 and cumsum_2 according to x < 1 and x >= 1, respectively.

(Thanks for the suggestion of @tdy to simplify the codes)

df['cumsum_1'] = df['value_1'].where(df['x'] < 1, 0).groupby(df['y']).cumsum()
df['cumsum_2'] = df['value_1'].where(df['x'] >= 1, 0).groupby(df['y']).cumsum()

Result:

print(df)

      x  y  value_1  cumsum_1  cumsum_2
0  0.10  1       12        12         0
1  1.20  1       10        12        10
2  0.25  1        7        19        10
3  1.00  2        3         0         3
4  0.72  2        5         5         3
5  1.50  2       10         5        13

Upvotes: 1

Quixotic22
Quixotic22

Reputation: 2924

Similar to above approaches but a little more chained.

df[['cumsum_1a', 'cumsum2a']] = (df.
 assign(
    v1 = lambda temp: temp.x >= 1,
    v2 = lambda temp: temp.v1 * temp.value_1,
    v3 = lambda temp: ~ temp.v1 * temp.value_1
    ).
 groupby('y')[['v2', 'v3']].
 cumsum()
 )

Upvotes: 0

mozway
mozway

Reputation: 260380

Here is another approach using a pivot:

(df.assign(ge1=df['x'].ge(1).map({True: 'cumsum_2', False: 'cumsum_1'}))
   .pivot(columns='ge1', values='value_1').fillna(0).groupby(df['y']).cumsum()
   .astype(int)
)

output:

ge1  cumsum_1  cumsum_2
0          12         0
1          12        10
2          19        10
3           0         3
4           5         3
5           5        13

full code:

df[['cumsum_1', 'cumsum_2']] = (df.assign(ge1=df['x'].ge(1).map({True: 'cumsum_2', False: 'cumsum_1'}))
                                  .pivot(columns='ge1', values='value_1').fillna(0).groupby(df['y']).cumsum()
                                  .astype(int)
                                )

(or use pd.concat to concatenate)

output:

   index     x  y  value_1  cumsum_1  cumsum_2
0      0  0.10  1       12        12         0
1      1  1.20  1       10        12        10
2      2  0.25  1        7        19        10
3      3  1.00  2        3         0         3
4      4  0.72  2        5         5         3
5      5  1.50  2       10         5        13

Upvotes: 0

Related Questions