Creating new columns by adding groups of columns

Question

I have a dataframe

df = pd.DataFrame({ 
    'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'], 
    'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'Small Business Loans < $100K 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans < $100K 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans < $100K 2018 ($000)': [200, 250, 0, 120, 0, 190],
    'Small Business Loans $100K-$250K 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans $100K-$250K 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans $100K-$250K 2018 ($000)': [200, 250, 0, 120, 0, 190]
    'Multi Family Loans 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Multi Family Loans 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Multi Family Loans 2018 ($000)': [200, 250, 0, 120, 0, 190]
})

I wish to create new columns which add 'Small Business Loans < $100K 2020 ($000)' to 'Small Business Loans $100K-$250K 2020 ($000)' and 'Small Business Loans < $100K 2019 ($000)' to 'Small Business Loans $100K-$250K 2019 ($000)'.

Basically I want to the see the sum of loans by year

The actual data set has many rows and other such sets of columns.

If the columns were identically named in two dataframes I could have used

df_add = df1.add(df2, fill_value=0)

tdy · Accepted Answer

Update: For the Small Business Loans, try a regex filter:

s = '\$(000)'
years = range(2018, 2021)

df.assign(**{
    f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
    for y in years
})

To combine both MF and SBL, change Small Business Loans to (Multi Family|Small Business Loans):

df.assign(**{
    f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
    for y in years
})

You can assign() the new columns with a comprehension:

df = df.assign(**{
    f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})

Can also do the summing with a like filter:

df = df.assign(**{
    f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})

Output:

      BU Line_Item  1Q16-1  2Q16-1  3Q16-1  1Q16-2  2Q16-2  3Q16-2  1Q16  2Q16  3Q16
0  Total  Revenues     100     100     200     100     100     200   200   200   400 
1  Total       EBT     120       0     250     120       0     250   240     0   500 
2  Total  Expenses       0     130       0       0     130       0     0   260     0 
3    CRS  Revenues     200     200     120     200     200     120   400   400   240 
4    CRS       EBT     190     190       0     190     190       0   380   380     0 
5    CRS  Expenses     210     210     190     210     210     190   420   420   380

Creating new columns by adding groups of columns

Answers (2)

Related Questions