Reputation: 163
I have a DataFrame that looks like this table:
index | x | y | value_1 | cumsum_1 | cumsum_2 |
---|---|---|---|---|---|
0 | 0.1 | 1 | 12 | 12 | 0 |
1 | 1.2 | 1 | 10 | 12 | 10 |
2 | 0.25 | 1 | 7 | 19 | 10 |
3 | 1.0 | 2 | 3 | 0 | 3 |
4 | 0.72 | 2 | 5 | 5 | 10 |
5 | 1.5 | 2 | 10 | 5 | 13 |
So my aim is to calculate the cumulative sum of value_1
. But there are two conditions that must be taken into account.
x
is less than 1 the cumsum()
is written in column cumsum_1
and if x
is greater in column cumsum_2
.y
indicates groups (1,2,3,...). When the value in y
changes, the cumsum()
-operation start all over again. I think the grouby()
method would help.Does somebody have any idea?
Upvotes: 1
Views: 3379
Reputation: 23217
You can use .where()
on conditions x
< 1 or x
>= 1 to temporarily modify the values of value_1
to 0 according to the condition and then groupby cumsum, as follows:
The second condition is catered by the .groupby
function while the first condition is catered by the .where()
function, detailed below:
.where()
keeps the column values when the condition is true and change the values (to 0 in this case) when the condition is false. Thus, for the first condition where column x
< 1, value_1
will keep its values for feeding to the subsequent cumsum
step to accumulate the filtered values of value_1
. For rows where the condition x
< 1 is False, value_1
has its values masked to 0. These 0 passed to cumsum
for accumulation is effectively the same effect as taking out the original values of value_1
for the accumulation into
column cumsum_1
.
The second line of codes accumulates value_1
values to column cumsum_2
with the opposite condition of x
>= 1. These 2 lines of codes, in effect, allocate value_1
to cumsum_1
and cumsum_2
according to x
< 1 and x
>= 1, respectively.
(Thanks for the suggestion of @tdy to simplify the codes)
df['cumsum_1'] = df['value_1'].where(df['x'] < 1, 0).groupby(df['y']).cumsum()
df['cumsum_2'] = df['value_1'].where(df['x'] >= 1, 0).groupby(df['y']).cumsum()
Result:
print(df)
x y value_1 cumsum_1 cumsum_2
0 0.10 1 12 12 0
1 1.20 1 10 12 10
2 0.25 1 7 19 10
3 1.00 2 3 0 3
4 0.72 2 5 5 3
5 1.50 2 10 5 13
Upvotes: 1
Reputation: 2924
Similar to above approaches but a little more chained.
df[['cumsum_1a', 'cumsum2a']] = (df.
assign(
v1 = lambda temp: temp.x >= 1,
v2 = lambda temp: temp.v1 * temp.value_1,
v3 = lambda temp: ~ temp.v1 * temp.value_1
).
groupby('y')[['v2', 'v3']].
cumsum()
)
Upvotes: 0
Reputation: 260380
Here is another approach using a pivot:
(df.assign(ge1=df['x'].ge(1).map({True: 'cumsum_2', False: 'cumsum_1'}))
.pivot(columns='ge1', values='value_1').fillna(0).groupby(df['y']).cumsum()
.astype(int)
)
output:
ge1 cumsum_1 cumsum_2
0 12 0
1 12 10
2 19 10
3 0 3
4 5 3
5 5 13
full code:
df[['cumsum_1', 'cumsum_2']] = (df.assign(ge1=df['x'].ge(1).map({True: 'cumsum_2', False: 'cumsum_1'}))
.pivot(columns='ge1', values='value_1').fillna(0).groupby(df['y']).cumsum()
.astype(int)
)
(or use pd.concat
to concatenate)
output:
index x y value_1 cumsum_1 cumsum_2
0 0 0.10 1 12 12 0
1 1 1.20 1 10 12 10
2 2 0.25 1 7 19 10
3 3 1.00 2 3 0 3
4 4 0.72 2 5 5 3
5 5 1.50 2 10 5 13
Upvotes: 0