Neil
Neil

Reputation: 8247

how to generate dynamic columns in pandas

I have following dataframe in pandas

code    tank     nozzle_1   nozzle_2    nozzle_var
123     1        1          1           10   
123     1        2          2           12
123     2        1          1           10   
123     2        2          2           12

I want to calculate cumulative sum of columns nozzle_1 and nozzle_2 grouping over tank. Following is my desired dataframe.

code    tank     nozzle_1   nozzle_2    nozzle_var   nozzle_1_cumsum   nozzle_2_cumsum
123     1        1          1           10           1                 1 
123     1        2          2           12           3                 3
123     2        1          1           10           1                 1
123     2        2          2           12           3                 3

I am getting nozzle_1 and nozzle_2 from following code in pandas

cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 

How can I calculate cumsum from above list of columns

Upvotes: 1

Views: 67

Answers (2)

Dev Khadka
Dev Khadka

Reputation: 5451

import pandas as pd

d = {
    "tank":[1,1,1,2],
    "nozzle_1":[1,2,1,2],
    "nozzle_2":[1,2,1,2]
}

df = pd.DataFrame(d)
columns = df.columns[df.columns.str.match("nozzle_\d+$")]

df[columns + "_cumsum"] = \
    df.groupby("tank")[columns].cumsum(axis=0)
df

Output

    tank    nozzle_1    nozzle_2    nozzle_1_cumsum nozzle_2_cumsum
0   1   1   1   1   1
1   1   2   2   3   3
2   1   1   1   4   4
3   2   2   2   2   2

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

How about this fancy solution:

cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 
df.assign(**df.groupby('tank')[cols].agg(['cumsum'])\
              .pipe(lambda x: x.set_axis(x.columns.map('_'.join), axis=1, inplace=False)))

Output:

   tank  nozzle_1  nozzle_2  nozzle_var  nozzle_1_cumsum  nozzle_2_cumsum
0     1         1         1          10                1                1
1     1         2         2          12                3                3
2     2         1         1          10                1                1
3     2         2         2          12                3                3

In steps:

df_cumsum = df.groupby('tank')[cols].agg(['cumsum'])
df_cumsum.columns = df_cumsum.columns.map('_'.join)
pd.concat([df, df_cumsum], axis=1)

Output:

   tank  nozzle_1  nozzle_2  nozzle_var  nozzle_1_cumsum  nozzle_2_cumsum
0     1         1         1          10                1                1
1     1         2         2          12                3                3
2     2         1         1          10                1                1
3     2         2         2          12                3                3

Upvotes: 2

Related Questions