user3118602
user3118602

Reputation: 583

Append new column to DF after sum?

I have a sample dataframe below:

sn C1-1 C1-2 C1-3 H2-1 H2-2 K3-1 K3-2
1   4     3    5    4    1    4    2
2   2     2    0    2    0    1    2
3   1     2    0    0    2    1    2

I will like to sum based on the prefix of C1, H2, K3 and output three new columns with the total sum. The final result is this:

sn total_c1 total_h2 total_k3
1      12      5        6
2      4       2        3
3      3       2        3

What I have tried on my original df:

lst = ["C1", "H2", "K3"]
lst2 = ["total_c1", "total_h2", "total_k3"]

for k in lst:
    idx = df.columns.str.startswith(i)
    for j in lst2:
        df[j] = df.iloc[:,idx].sum(axis=1)
        df1 = df.append(df, sort=False)

But I kept getting error

IndexError: Item wrong length 35 instead of 36.

I can't figure out how to append the new total column to produce my end result in the loop.

Any help will be appreciated (or better suggestion as oppose to loop). Thank you.

Upvotes: 1

Views: 62

Answers (3)

sammywemmy
sammywemmy

Reputation: 28699

Another option, where we create a dictionary to groupby the columns:

mapping = {entry: f"total_{entry[:2]}" for entry in df.columns[1:]}
result = df.groupby(mapping, axis=1).sum()
result.insert(0, "sn", df.sn)
result

   sn   total_C1    total_H2    total_K3
0   1      12          5          6
1   2      4           2          3
2   3      3           2          3

Upvotes: 0

BENY
BENY

Reputation: 323306

Let us try ,split then groupby with it with axis=1

out = df.groupby(df.columns.str.split('-').str[0],axis=1).sum().set_index('sn').add_prefix('Total_').reset_index()
Out[84]: 
   sn  Total_C1  Total_H2  Total_K3
0   1        12         5         6
1   2         4         2         3
2   3         3         2         3

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150765

You can use groupby:

# columns of interest
cols = df.columns[1:]

col_groups = cols.str.split('-').str[0]
out_df = df[['sn']].join(df[cols].groupby(col_groups, axis=1)
                            .sum()
                            .add_prefix('total_')
                        )

Output:

   sn  total_C1  total_H2  total_K3
0   1        12         5         6
1   2         4         2         3
2   3         3         2         3

Upvotes: 1

Related Questions