postcolonialist
postcolonialist

Reputation: 661

Creating new columns by adding groups of columns

I have a dataframe

df = pd.DataFrame({ 
    'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'], 
    'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'Small Business Loans < $100K 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans < $100K 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans < $100K 2018 ($000)': [200, 250, 0, 120, 0, 190],
    'Small Business Loans $100K-$250K 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Small Business Loans $100K-$250K 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Small Business Loans $100K-$250K 2018 ($000)': [200, 250, 0, 120, 0, 190]
    'Multi Family Loans 2020 ($000)': [100, 120, 0, 200, 190, 210],
    'Multi Family Loans 2019 ($000)': [100, 0, 130, 200, 190, 210],
    'Multi Family Loans 2018 ($000)': [200, 250, 0, 120, 0, 190]
})

I wish to create new columns which add 'Small Business Loans < $100K 2020 ($000)' to 'Small Business Loans $100K-$250K 2020 ($000)' and 'Small Business Loans < $100K 2019 ($000)' to 'Small Business Loans $100K-$250K 2019 ($000)'.

Basically I want to the see the sum of loans by year

The actual data set has many rows and other such sets of columns.

If the columns were identically named in two dataframes I could have used

df_add = df1.add(df2, fill_value=0)

Upvotes: 2

Views: 58

Answers (2)

tdy
tdy

Reputation: 41347

Update: For the Small Business Loans, try a regex filter:

s = '\$(000)'
years = range(2018, 2021)

df.assign(**{
    f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
    for y in years
})

To combine both MF and SBL, change Small Business Loans to (Multi Family|Small Business Loans):

df.assign(**{
    f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
    for y in years
})

You can assign() the new columns with a comprehension:

df = df.assign(**{
    f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})

Can also do the summing with a like filter:

df = df.assign(**{
    f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})

Output:

      BU Line_Item  1Q16-1  2Q16-1  3Q16-1  1Q16-2  2Q16-2  3Q16-2  1Q16  2Q16  3Q16
0  Total  Revenues     100     100     200     100     100     200   200   200   400 
1  Total       EBT     120       0     250     120       0     250   240     0   500 
2  Total  Expenses       0     130       0       0     130       0     0   260     0 
3    CRS  Revenues     200     200     120     200     200     120   400   400   240 
4    CRS       EBT     190     190       0     190     190       0   380   380     0 
5    CRS  Expenses     210     210     190     210     210     190   420   420   380 

Upvotes: 3

Jimmar
Jimmar

Reputation: 4459

So you need to do

df['1Q16-sum'] = df['1Q16-1'] + df['1Q16-2']
df['2Q16-sum'] = df['2Q16-1'] + df['2Q16-2']
df['3Q16-sum'] = df['3Q16-1'] + df['3Q16-2']

or with a loop

# python 3.6+
for i in range(1,4):
    df[f'{i}Q16-sum'] = df[f'{i}Q16-1'] + df[f'{i}Q16-2']

# or other python versions
for i in range(1,4):
    id = str(i) + "Q16"
    df[id+'-sum'] = df[id+'-1'] + df[id+'-2']

giving you

      BU Line_Item  1Q16-1  2Q16-1  3Q16-1  1Q16-2  2Q16-2  3Q16-2  1Qq6-sum  2Qq6-sum  3Qq6-sum
0  Total  Revenues     100     100     200     100     100     200       200       200       400
1  Total       EBT     120       0     250     120       0     250       240         0       500
2  Total  Expenses       0     130       0       0     130       0         0       260         0
3    CRS  Revenues     200     200     120     200     200     120       400       400       240
4    CRS       EBT     190     190       0     190     190       0       380       380         0
5    CRS  Expenses     210     210     190     210     210     190       420       420       380

is that it ?

Upvotes: 0

Related Questions