sum column based on level selected in column header

Question

I have a pd.dataframe and it looks like this. Note column names represent level.

   df
             PC        0     1     2    3 
       0    PC_1      0.74  0.25  0.1  0.0   
       1    PC_1      0.72  0.26  0.1  0.1 
       2    PC_2      0.80  0.18  0.2  0.0
       3    PC_3      0.79  0.19  0.1  0.1

I want to create another 4 columns next to the existing columns and shift the values based on the condition assigned.

For example: if level =1, df should look like this:

       df
             PC        0     1     2    3    0_1     1_1         2_1  3_1
       0    PC_1      0.74  0.25  0.1  0.0   0.0   (0.72+0.25)   0.1  0.0
       1    PC_1      0.72  0.26  0.1  0.1   0.0   (0.72+0.26)   0.1  0.1 
       2    PC_2      0.80  0.18  0.2  0.0   0.0   (0.80+0.18)   0.2  0.0
       3    PC_3      0.79  0.19  0.1  0.1   0.0   (0.79+0.19)   0.1  0.0

If level=3,

       df
             PC        0     1     2    3    0_3   1_3  2_3   3_3
       0    PC_1      0.74  0.25  0.1  0.0   0.0   0.0  0.0  sum(0.74+0.25+0.1+0.0)
       1    PC_1      0.72  0.26  0.1  0.1   0.0   0.0  0.0  sum(0.72+0.26+0.1+0.1)
       2    PC_2      0.80  0.18  0.2  0.0   0.0   0.0  0.0  sum(0.80+0.18+0.20+0.0)
       3    PC_3      0.79  0.19  0.1  0.1   0.0   0.0  0.0  sum(0.79+0.19+0.1+0.1)

I don't know how to solve the problem and am looking for help.

Thank you in advance.

ALollz · Accepted Answer

Set 'PC' to the index to make things easier. We zero everything before your column, cumsum up to the column, and keep everything as is after your column.

df = df.set_index('PC')

def add_sum(df, level):
    i = df.columns.get_loc(level)
    df_add = (pd.concat([pd.DataFrame(0, index=df.index, columns=df.columns[:i]), 
                         df.cumsum(1).iloc[:, i], 
                         df.iloc[:, i+1:]], 
                        axis=1)
                .add_suffix(f'_{level}'))

    return pd.concat([df, df_add], axis=1)

add_sum(df, '1')  # 1 if columns labels are int
         0     1    2    3  0_1   1_1  2_1  3_1
PC                                             
PC_1  0.74  0.25  0.1  0.0    0  0.99  0.1  0.0
PC_1  0.72  0.26  0.1  0.1    0  0.98  0.1  0.1
PC_2  0.80  0.18  0.2  0.0    0  0.98  0.2  0.0
PC_3  0.79  0.19  0.1  0.1    0  0.98  0.1  0.1

add_sum(df, '3')
         0     1    2    3  0_3  1_3  2_3   3_3
PC                                             
PC_1  0.74  0.25  0.1  0.0    0    0    0  1.09
PC_1  0.72  0.26  0.1  0.1    0    0    0  1.18
PC_2  0.80  0.18  0.2  0.0    0    0    0  1.18
PC_3  0.79  0.19  0.1  0.1    0    0    0  1.18

sum column based on level selected in column header

Answers (2)

Related Questions