Andres Green
Andres Green

Reputation: 61

Add values in columns if criteria from another column is met

I have the following DataFrame

import pandas as pd 

d = {'Client':[1,2,3,4],'Salesperson':['John','John','Bob','Richard'], 
     'Amount':[1000,1000,0,500],'Salesperson 2':['Bob','Richard','John','Tom'],
     'Amount2':[400,200,300,500]}
df = pd.DataFrame(data=d)
Client Salesperson Amount Salesperson Amount2
1 John 1000 Bob 400
2 John 1000 Richard 200
3 Bob 0 John 300
4 Richard 500 Tom 500

And I just need to create some sort of "sumif" statement (the one from excel) that will add the amount each salesperson is due. I don't know how to iterate over each row, but I want to have it so that it adds the values in "Amount" and "Amount2" for each one of the salespersons.

Then I need to be able to see the amount per salesperson.

Expected Output (Ideally in a DataFrame as well)

Sales Person Total Amount
John 2300
Bob 400
Richard 700
Tom 500

Upvotes: 1

Views: 426

Answers (2)

Vaishali
Vaishali

Reputation: 38415

There can be multiple ways of solving this. One option is to use Pandas Concat to join required columns and use groupby

merged_df = pd.concat([df[['Salesperson','Amount']], df[['Salesperson 2', 'Amount2']].rename(columns={'Salesperson 2':'Salesperson','Amount2':'Amount'})])
merged_df.groupby('Salesperson',as_index = False)['Amount'].sum()

you get

    Salesperson Amount
0   Bob         400
1   John        2300
2   Richard     700
3   Tom         500

Edit: If you have another pair of salesperson/amount, you can add that to the concat

d = {'Client':[1,2,3,4],'Salesperson':['John','John','Bob','Richard'], 
     'Amount':[1000,1000,0,500],'Salesperson 2':['Bob','Richard','John','Tom'],
     'Amount2':[400,200,300,500], 'Salesperson 3':['Nick','Richard','Sam','Bob'],
     'Amount3':[400,800,100,400]}
df = pd.DataFrame(data=d)

merged_df = pd.concat([df[['Salesperson','Amount']], df[['Salesperson 2', 'Amount2']].rename(columns={'Salesperson 2':'Salesperson','Amount2':'Amount'}), df[['Salesperson 3', 'Amount3']].rename(columns={'Salesperson 3':'Salesperson','Amount3':'Amount'})])
merged_df.groupby('Salesperson',as_index = False)['Amount'].sum()

    Salesperson Amount
0   Bob         800
1   John        2300
2   Nick        400
3   Richard     1500
4   Sam         100
5   Tom         500

Edit 2: Another solution using pandas wide_to_long

df = df.rename({'Salesperson':'Salesperson 1','Amount':'Amount1'}, axis='columns')
reshaped_df = pd.wide_to_long(df, stubnames=['Salesperson','Amount'], i='Client',j='num', suffix='\s?\d+').reset_index(drop = 1)

The above will reshape df,

    Salesperson Amount
0   John    1000
1   John    1000
2   Bob     0
3   Richard 500
4   Bob     400
5   Richard 200
6   John    300
7   Tom     500
8   Nick    400
9   Richard 800
10  Sam     100
11  Bob     400

A simple groupby on reshaped_df will give you required output

reshaped_df.groupby('Salesperson', as_index = False)['Amount'].sum()

Upvotes: 3

sammywemmy
sammywemmy

Reputation: 28644

One option is to tidy the dataframe into long form, where all the Salespersons are in one column, and the amounts are in another, then you can groupby and get the aggregate.

Let's use pivot_longer from pyjanitor to transform to long form:

# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
     index="Client",
     names_to=".value",
     names_pattern=r"([a-zA-Z]+).*",
     )
.groupby("Salesperson", as_index = False)
.Amount
.sum()
)

  Salesperson  Amount
0         Bob     400
1        John    2300
2     Richard     700
3         Tom     500

The .value tells the function to keep only those parts of the column that match it as headers. The columns have a pattern (They start with a text - either Salesperson or Amount - and either have a number at the end ( or not). This pattern is captured in names_pattern. .value is paired with the regex in the brackets, those outside do not matter in this case.

Once transformed into long form, it is easy to groupby and aggregate. The as_index parameter allows us to keep the output as a dataframe.

Upvotes: 0

Related Questions