Reputation: 661
I have a dataframe
df = pd.DataFrame({
'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'],
'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'Small Business Loans < $100K 2020 ($000)': [100, 120, 0, 200, 190, 210],
'Small Business Loans < $100K 2019 ($000)': [100, 0, 130, 200, 190, 210],
'Small Business Loans < $100K 2018 ($000)': [200, 250, 0, 120, 0, 190],
'Small Business Loans $100K-$250K 2020 ($000)': [100, 120, 0, 200, 190, 210],
'Small Business Loans $100K-$250K 2019 ($000)': [100, 0, 130, 200, 190, 210],
'Small Business Loans $100K-$250K 2018 ($000)': [200, 250, 0, 120, 0, 190]
'Multi Family Loans 2020 ($000)': [100, 120, 0, 200, 190, 210],
'Multi Family Loans 2019 ($000)': [100, 0, 130, 200, 190, 210],
'Multi Family Loans 2018 ($000)': [200, 250, 0, 120, 0, 190]
})
I wish to create new columns which add 'Small Business Loans < $100K 2020 ($000)' to 'Small Business Loans $100K-$250K 2020 ($000)' and 'Small Business Loans < $100K 2019 ($000)' to 'Small Business Loans $100K-$250K 2019 ($000)'.
Basically I want to the see the sum of loans by year
The actual data set has many rows and other such sets of columns.
If the columns were identically named in two dataframes I could have used
df_add = df1.add(df2, fill_value=0)
Upvotes: 2
Views: 58
Reputation: 41347
Update: For the Small Business Loans, try a regex
filter:
s = '\$(000)'
years = range(2018, 2021)
df.assign(**{
f'SBL {y} {s}': df.filter(regex=fr'Small Business Loans.*{y}.*{s}').sum(1)
for y in years
})
To combine both MF and SBL, change Small Business Loans
to (Multi Family|Small Business Loans)
:
df.assign(**{
f'Loans {y} {s}': df.filter(regex=fr'(Multi Family|Small Business Loans).*{y}.*{s}').sum(1)
for y in years
})
You can assign()
the new columns with a comprehension:
df = df.assign(**{
f'{i}Q16': df[f'{i}Q16-1'] + df[f'{i}Q16-2'] for i in [1,2,3]
})
Can also do the summing with a like
filter:
df = df.assign(**{
f'{i}Q16': df.filter(like=f'{i}Q16').sum(1) for i in [1,2,3]
})
Output:
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Q16 2Q16 3Q16
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380
Upvotes: 3
Reputation: 4459
So you need to do
df['1Q16-sum'] = df['1Q16-1'] + df['1Q16-2']
df['2Q16-sum'] = df['2Q16-1'] + df['2Q16-2']
df['3Q16-sum'] = df['3Q16-1'] + df['3Q16-2']
or with a loop
# python 3.6+
for i in range(1,4):
df[f'{i}Q16-sum'] = df[f'{i}Q16-1'] + df[f'{i}Q16-2']
# or other python versions
for i in range(1,4):
id = str(i) + "Q16"
df[id+'-sum'] = df[id+'-1'] + df[id+'-2']
giving you
BU Line_Item 1Q16-1 2Q16-1 3Q16-1 1Q16-2 2Q16-2 3Q16-2 1Qq6-sum 2Qq6-sum 3Qq6-sum
0 Total Revenues 100 100 200 100 100 200 200 200 400
1 Total EBT 120 0 250 120 0 250 240 0 500
2 Total Expenses 0 130 0 0 130 0 0 260 0
3 CRS Revenues 200 200 120 200 200 120 400 400 240
4 CRS EBT 190 190 0 190 190 0 380 380 0
5 CRS Expenses 210 210 190 210 210 190 420 420 380
is that it ?
Upvotes: 0