Spring
Spring

Reputation: 193

sum column based on level selected in column header

I have a pd.dataframe and it looks like this. Note column names represent level.

   df
             PC        0     1     2    3 
       0    PC_1      0.74  0.25  0.1  0.0   
       1    PC_1      0.72  0.26  0.1  0.1 
       2    PC_2      0.80  0.18  0.2  0.0
       3    PC_3      0.79  0.19  0.1  0.1 

I want to create another 4 columns next to the existing columns and shift the values based on the condition assigned.

For example: if level =1, df should look like this:

       df
             PC        0     1     2    3    0_1     1_1         2_1  3_1
       0    PC_1      0.74  0.25  0.1  0.0   0.0   (0.72+0.25)   0.1  0.0
       1    PC_1      0.72  0.26  0.1  0.1   0.0   (0.72+0.26)   0.1  0.1 
       2    PC_2      0.80  0.18  0.2  0.0   0.0   (0.80+0.18)   0.2  0.0
       3    PC_3      0.79  0.19  0.1  0.1   0.0   (0.79+0.19)   0.1  0.0

If level=3,

       df
             PC        0     1     2    3    0_3   1_3  2_3   3_3
       0    PC_1      0.74  0.25  0.1  0.0   0.0   0.0  0.0  sum(0.74+0.25+0.1+0.0)
       1    PC_1      0.72  0.26  0.1  0.1   0.0   0.0  0.0  sum(0.72+0.26+0.1+0.1)
       2    PC_2      0.80  0.18  0.2  0.0   0.0   0.0  0.0  sum(0.80+0.18+0.20+0.0)
       3    PC_3      0.79  0.19  0.1  0.1   0.0   0.0  0.0  sum(0.79+0.19+0.1+0.1)

I don't know how to solve the problem and am looking for help.

Thank you in advance.

Upvotes: 0

Views: 57

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30991

As you wrote based on level selected in column header in the title, I understand that:

  • there is no "external" level variable,
  • the level (how many columns to sum) results just from the source column name.

So the task is actually to "concatenate" your both expected results (you presented only how to compute column 1_1 and 3_1) and compute other new columns the same way.

The solution to do it is surprisingly concise. Run the following one-liner:

df = df.join(df.iloc[:, 1:].cumsum(axis=1)
    .rename(lambda name: str(name) + '_1', axis=1))

Details:

  • df.iloc[:, 1:] - Take all rows, starting from column 1 (column numbers from 0).
  • cumsum(axis=1) - Compute cumulative sum, horizontally.
  • rename(..., axis=1) - Rename columns.
  • lambda name: str(name) + '_1' - Lambda function to compute new column name.
  • The result so far - new columns.
  • df = df.join(...) - Join with the original DataFrame and save the result back under df.

Upvotes: 1

ALollz
ALollz

Reputation: 59549

Set 'PC' to the index to make things easier. We zero everything before your column, cumsum up to the column, and keep everything as is after your column.

df = df.set_index('PC')

def add_sum(df, level):
    i = df.columns.get_loc(level)
    df_add = (pd.concat([pd.DataFrame(0, index=df.index, columns=df.columns[:i]), 
                         df.cumsum(1).iloc[:, i], 
                         df.iloc[:, i+1:]], 
                        axis=1)
                .add_suffix(f'_{level}'))

    return pd.concat([df, df_add], axis=1)

add_sum(df, '1')  # 1 if columns labels are int
         0     1    2    3  0_1   1_1  2_1  3_1
PC                                             
PC_1  0.74  0.25  0.1  0.0    0  0.99  0.1  0.0
PC_1  0.72  0.26  0.1  0.1    0  0.98  0.1  0.1
PC_2  0.80  0.18  0.2  0.0    0  0.98  0.2  0.0
PC_3  0.79  0.19  0.1  0.1    0  0.98  0.1  0.1

add_sum(df, '3')
         0     1    2    3  0_3  1_3  2_3   3_3
PC                                             
PC_1  0.74  0.25  0.1  0.0    0    0    0  1.09
PC_1  0.72  0.26  0.1  0.1    0    0    0  1.18
PC_2  0.80  0.18  0.2  0.0    0    0    0  1.18
PC_3  0.79  0.19  0.1  0.1    0    0    0  1.18

Upvotes: 1

Related Questions