Summing columns according to pattern in column names

Question

Lets start with very simplified abstract example, I hava a dataframe like this:

import pandas as pd

d = {'1-A': [1, 2], '1-B': [3, 4], '2-A': [3, 4], '5-B': [2, 7]}
df = pd.DataFrame(data=d)

     1-A  1-B  2-A  5-B
0    1    3    3    2
1    2    4    4    7

I'm looking for elegant pandastic solution to have dataframe like this:

     1    2    5
0    4    3    2
1    6    4    7

To make example more concrete column 1-A, means person id=1, expenses category A. Rows are expenses every month. In result, I want to have monthly expenses per person across categories (so column 1 is sum of column 1-A and 1-B). Note that, when there is no expenses, there is no column with 0s. Of course it should be ready for more columns (ids and categories).

I'm quite sure that smart solution with good separation of column selection and summing opeation for this exist.

jezrael · Accepted Answer

Use groupby with lambda function with split and select first value, for grouping by columns add axis=1:

df1 = df.groupby(lambda x: x.split('-')[0], axis=1).sum()
#alternative
#df1 = df.groupby(df.columns.str.split('-').str[0], axis=1).sum()
print (df1) 

   1  2  5
0  4  3  2
1  6  4  7

Summing columns according to pattern in column names

Answers (1)

Related Questions