Trying to find the best way to generate an 'Other' row in my pandas dataframe. 'Other' is calculated by adding up all the Source values that are not 'Total' and then subtracting by the 'Total' values. Ex: 'Other' = Total - (Souce_1 + Souce_2 + Souce_3) Here's an example of what I am starting with: Name Source Lead Sale Prop_A Source_1 100 3 Prop_A Source_2 50 5 Prop_A Source_3 20 0 Prop_A Total 300 11 Prop_B Source_1 200 10 Prop_B Source_2 300 6 Prop_B Source_3 20 0 Prop_B Total 700 23 And this is what I am try to create: Name Source Lead Sale Prop_A Source_1 100 3 Prop_A Source_2 50 5 Prop_A Source_3 20 0 Prop_A Other 130 3 Prop_A Total 300 11 Prop_B Source_1 200 10 Prop_B Source_2 300 6 Prop_B Source_3 20 0 Prop_B Other 180 7 Prop_B Total 700 23 I was able to calculate the 'Other' row by using following code, but know this isn't the best way to do it. Wondering if anyone knows a better way? Total_df = df[df['Source'] == 'Total'] All_Sources_df = df[df['Source'] != 'Total'] All_Sources_df = All_Sources_df.groupby(['Name'], as_index=False).sum() result = pd.merge(Total_df, All_Sources_df, on=['Name']) result['Lead'] = result['Lead_x'] - result['Lead_y'] result['Sale'] = result['Sale_x'] - result['Sale_y'] result = result[['Name', 'Lead', 'Sale']] result['Source'] = 'Other' result = result[['Name','Source','Lead','Sale']]

Reputation: 23

Creating a Calculated Row Based on Multiple Conditions in a Dataframe with Python

Trying to find the best way to generate an 'Other' row in my pandas dataframe. 'Other' is calculated by adding up all the Source values that are not 'Total' and then subtracting by the 'Total' values.

Ex: 'Other' = Total - (Souce_1 + Souce_2 + Souce_3)

Here's an example of what I am starting with:

Name	Source	Lead	Sale
Prop_A	Source_1	100	3
Prop_A	Source_2	50	5
Prop_A	Source_3	20	0
Prop_A	Total	300	11
Prop_B	Source_1	200	10
Prop_B	Source_2	300	6
Prop_B	Source_3	20	0
Prop_B	Total	700	23

And this is what I am try to create:

Name	Source	Lead	Sale
Prop_A	Source_1	100	3
Prop_A	Source_2	50	5
Prop_A	Source_3	20	0
Prop_A	Other	130	3
Prop_A	Total	300	11
Prop_B	Source_1	200	10
Prop_B	Source_2	300	6
Prop_B	Source_3	20	0
Prop_B	Other	180	7
Prop_B	Total	700	23

I was able to calculate the 'Other' row by using following code, but know this isn't the best way to do it. Wondering if anyone knows a better way?

Total_df = df[df['Source'] == 'Total']
All_Sources_df = df[df['Source'] != 'Total']

All_Sources_df = All_Sources_df.groupby(['Name'], as_index=False).sum()

result = pd.merge(Total_df, All_Sources_df, on=['Name'])

result['Lead'] =  result['Lead_x'] - result['Lead_y']
result['Sale'] =  result['Sale_x'] - result['Sale_y']

result = result[['Name', 'Lead', 'Sale']]

result['Source'] = 'Other'

result = result[['Name','Source','Lead','Sale']]

Upvotes: 2

Answers (3)

RomanPerekhrest

Reputation: 92854

With custom apply function and pd.concat (to arrange records for each group):

def f(x):
    m = x['Source'].eq('Total')  # mask denoting Total record
    t = x.loc[m, ['Lead', 'Sale']]  # 'Total' values
    return pd.concat([x[~m], t.sub(x.loc[~m, ['Lead', 'Sale']].sum())
                     .assign(Source='Other', Name=x.Name), x[m]])

df = df.groupby('Name', as_index=False).apply(f).reset_index(drop=True)

     Name    Source  Lead  Sale
0  Prop_A  Source_1   100     3
1  Prop_A  Source_2    50     5
2  Prop_A  Source_3    20     0
3  Prop_A     Other   130     3
4  Prop_A     Total   300    11
5  Prop_B  Source_1   200    10
6  Prop_B  Source_2   300     6
7  Prop_B  Source_3    20     0
8  Prop_B     Other   180     7
9  Prop_B     Total   700    23

Upvotes: 1

Corralien

Reputation: 120469

You can use:

cols = ['Name', 'Lead', 'Sale']
m = df['Source'] == 'Total'

other = (df.loc[m, cols].set_index('Name')
           .sub(df.loc[~m, cols].groupby('Name').sum())
           .assign(Source='Other').reset_index())

result = pd.concat([df, other]).sort_values(['Name', 'Source'], ignore_index=True)
print(result)

# Output
     Name    Source  Lead  Sale
0  Prop_A     Other   130     3
1  Prop_A  Source_1   100     3
2  Prop_A  Source_2    50     5
3  Prop_A  Source_3    20     0
4  Prop_A     Total   300    11
5  Prop_B     Other   180     7
6  Prop_B  Source_1   200    10
7  Prop_B  Source_2   300     6
8  Prop_B  Source_3    20     0
9  Prop_B     Total   700    23

Upvotes: 1

Anoushiravan R

Reputation: 21938

You could try this. I assumed you may have more columns other than Total, so in this case we can use ~ with pd.DataFrame.isin to refer to all the other columns other that Total :

(df.set_index('Name')
 .pivot(columns='Source')
 .stack(level=0)
 .assign(other = lambda d: d['Total'] - (d.loc[:, ~ d.columns.isin(['Total'])].sum(axis=1)))
 .stack()
 .reset_index()
 .pivot(index=['Name', 'Source'], columns='level_1')[0])


level_1          Lead  Sale
Name   Source              
Prop_A Source_1   100     3
       Source_2    50     5
       Source_3    20     0
       Total      300    11
       other      130     3
Prop_B Source_1   200    10
       Source_2   300     6
       Source_3    20     0
       Total      700    23
       other      180     7

Upvotes: 1

Creating a Calculated Row Based on Multiple Conditions in a Dataframe with Python

Answers (3)

Related Questions