Reputation: 193
I have a pd.dataframe and it looks like this. Note column names represent level.
df
PC 0 1 2 3
0 PC_1 0.74 0.25 0.1 0.0
1 PC_1 0.72 0.26 0.1 0.1
2 PC_2 0.80 0.18 0.2 0.0
3 PC_3 0.79 0.19 0.1 0.1
I want to create another 4 columns next to the existing columns and shift the values based on the condition assigned.
For example: if level =1, df should look like this:
df
PC 0 1 2 3 0_1 1_1 2_1 3_1
0 PC_1 0.74 0.25 0.1 0.0 0.0 (0.72+0.25) 0.1 0.0
1 PC_1 0.72 0.26 0.1 0.1 0.0 (0.72+0.26) 0.1 0.1
2 PC_2 0.80 0.18 0.2 0.0 0.0 (0.80+0.18) 0.2 0.0
3 PC_3 0.79 0.19 0.1 0.1 0.0 (0.79+0.19) 0.1 0.0
If level=3,
df
PC 0 1 2 3 0_3 1_3 2_3 3_3
0 PC_1 0.74 0.25 0.1 0.0 0.0 0.0 0.0 sum(0.74+0.25+0.1+0.0)
1 PC_1 0.72 0.26 0.1 0.1 0.0 0.0 0.0 sum(0.72+0.26+0.1+0.1)
2 PC_2 0.80 0.18 0.2 0.0 0.0 0.0 0.0 sum(0.80+0.18+0.20+0.0)
3 PC_3 0.79 0.19 0.1 0.1 0.0 0.0 0.0 sum(0.79+0.19+0.1+0.1)
I don't know how to solve the problem and am looking for help.
Thank you in advance.
Upvotes: 0
Views: 57
Reputation: 30991
As you wrote based on level selected in column header in the title, I understand that:
So the task is actually to "concatenate" your both expected results (you presented only how to compute column 1_1 and 3_1) and compute other new columns the same way.
The solution to do it is surprisingly concise. Run the following one-liner:
df = df.join(df.iloc[:, 1:].cumsum(axis=1)
.rename(lambda name: str(name) + '_1', axis=1))
Details:
df.iloc[:, 1:]
- Take all rows, starting from column 1 (column
numbers from 0).cumsum(axis=1)
- Compute cumulative sum, horizontally.rename(..., axis=1)
- Rename columns.lambda name: str(name) + '_1'
- Lambda function to compute new
column name.df = df.join(...)
- Join with the original DataFrame and save the
result back under df.Upvotes: 1
Reputation: 59549
Set 'PC'
to the index to make things easier. We zero everything before your column, cumsum up to the column, and keep everything as is after your column.
df = df.set_index('PC')
def add_sum(df, level):
i = df.columns.get_loc(level)
df_add = (pd.concat([pd.DataFrame(0, index=df.index, columns=df.columns[:i]),
df.cumsum(1).iloc[:, i],
df.iloc[:, i+1:]],
axis=1)
.add_suffix(f'_{level}'))
return pd.concat([df, df_add], axis=1)
add_sum(df, '1') # 1 if columns labels are int
0 1 2 3 0_1 1_1 2_1 3_1
PC
PC_1 0.74 0.25 0.1 0.0 0 0.99 0.1 0.0
PC_1 0.72 0.26 0.1 0.1 0 0.98 0.1 0.1
PC_2 0.80 0.18 0.2 0.0 0 0.98 0.2 0.0
PC_3 0.79 0.19 0.1 0.1 0 0.98 0.1 0.1
add_sum(df, '3')
0 1 2 3 0_3 1_3 2_3 3_3
PC
PC_1 0.74 0.25 0.1 0.0 0 0 0 1.09
PC_1 0.72 0.26 0.1 0.1 0 0 0 1.18
PC_2 0.80 0.18 0.2 0.0 0 0 0 1.18
PC_3 0.79 0.19 0.1 0.1 0 0 0 1.18
Upvotes: 1