Lynn
Lynn

Reputation: 4398

perform cumulative calculations based on certain columns in Python

I have a dataframe where I would like to perform cumulative calculations based on certain columns. Needs to be grouped by id and date.

Data

id  date    pwr start   move
aa  q1 22   10  4       2
aa  q2 22   5   3       1
aa  q3 22   7   3       0
aa  q4 22   3   3       0
bb  q1 22   20  1       0
bb  q2 22   10  2       0
bb  q3 22   5   2       1
bb  q4 22   5   1       1

Logic:

final column values created by:

  1. ‘start’ column - ‘move’column
  2. ‘final’ column - ‘pwr’ column - ‘move’ column

Desired

id  date    pwr start   move    final
aa  q1 22   10  4       2       2
aa  q2 22   5   3       1      -4
aa  q3 22   7   3       0      -11
aa  q4 22   3   3       0      -14
bb  q1 22   20  1       0       1
bb  q2 22   10  2       0      -9
bb  q3 22   5   2       1      -15
bb  q4 22   5   1       1      -21

Doing

df['final'] = df['start'].sub(df.groupby('id')['date'].cumsum())
df = df.assign(start = df['start'] - df['move'])

Performing cumsum grouping by id and date- however, the values in the final column are not calculating correctly. I am still troubleshooting this, any suggestion is appreciated.

Upvotes: 1

Views: 76

Answers (1)

Corralien
Corralien

Reputation: 120429

Try:

def f(x):
    final = x['start'].sub(x['move']).iloc[0]
    return x.shift(-1)[['pwr', 'move']].sum(axis=1).mul(-1) \
            .shift(fill_value=final).cumsum()

df['final'] = df.groupby('id').apply(f).astype(int).values
>>> df
   id   date  pwr  start  move  final
0  aa  q1 22   10      4     2      2
1  aa  q2 22    5      3     1     -4
2  aa  q3 22    7      3     0    -11
3  aa  q4 22    3      3     0    -14
4  bb  q1 22   20      1     0      1
5  bb  q2 22   10      2     0     -9
6  bb  q3 22    5      2     1    -15
7  bb  q4 22    5      1     1    -21

Upvotes: 1

Related Questions