Owen Jenkins
Owen Jenkins

Reputation: 11

Filtering within groupby function pandas

I'm working in pandas and have a dataframe of sales information with a TradeID, Seller ID,and a Payment date, and I want to calculate for each Seller ID the number of previous succesful Trades, before the current trade date, that the seller has had. I'm fairly sure that it's a split apply combine style solution but I can't figure out how to achieve it.

The data looks a bit like this

|---------|-----------|-------------|------------|
|Trade ID | Seller ID | Payment Date|Success?    |
|---------|-----------|-------------|------------|
|001      |001        |08/12/2016   |Yes         |
|002      |001        |07/12/2016   |Yes         |
|003      |001        |12/12/2016   |No          |
|004      |001        |13/12/2016   |Yes         |
|005      |001        |14/12/2016   |Null        |
|005      |002        |18/01/2012   |Yes         |
|006      |002        |01/01/2011   |Null        |
|007      |003        |07/02/2010   |Yes         |

and I'm looking for an output that would be something like:

|---------|-----------|-------------|------------|--------------------------|
|Trade ID | Seller ID | Payment Date|Success?    |Previous Succesful Trades |
|---------|-----------|-------------|------------|--------------------------|
|001      |001        |08/12/2016   |Yes         |1                         |
|002      |001        |07/12/2016   |Yes         |0                         |
|003      |001        |12/12/2016   |No          |2                         |
|004      |001        |13/12/2016   |Yes         |2                         |
|005      |001        |14/12/2016   |Null        |3                         |
|005      |002        |18/01/2012   |Yes         |1                         |
|006      |002        |01/01/2011   |Null        |0                         |
|007      |003        |07/02/2010   |Yes         |0                         |

I can aggregate the successful trades for each seller, but this does not take in to account the fact that at the point the seller makes the trade they will not have completed all the trades in the dataframe. For example on 12/12/2016, seller 1 will not have completed the trade on 13/12/2016 therefore their previous successfully completed trades is 2 and not 3.

Any help greatly appreciated !

Upvotes: 1

Views: 88

Answers (4)

hlhorsnellphd
hlhorsnellphd

Reputation: 33

The pandas documentation is quite helpful for figuring these solutions! Below is one possibility that will also allow for recording unsuccessful trades.

df = pd.DataFrame(
{"Trade ID":[1,2,3,4,5,6],
"Seller ID":[1,1,2,2,2,2],
"Payment Date":["2016-12=08","2016-12-12","2012-01-18","2016-02=08","2016-12-16","2012-01-31"],
"Success?":["Yes","No", "Yes", "Yes","No", "Yes"]})

Then group on Seller ID

grouped_seller = df.groupby("Seller ID")

Then you can do multiple different things, but value counts will count how many times YES or No is for each seller ID

grouped_seller["Success?"].value_counts()


Seller ID  Success?
1          No          1
           Yes         1
2          Yes         3
           No          1
Name: Success?, dtype: int64

You could then access the seller id using .loc[]

grouped_seller["Success?"].value_counts().loc[2]

Success?
Yes    3
No     1
Name: Success?, dtype: int64

N.B. the dataframe I create will not have the same dtypes as yours unless you convert dates to datetime

Upvotes: 1

mullinscr
mullinscr

Reputation: 1738

The easiest way I can think of is to first of all replace the 'Yes' and 'No' with True and False, then groupby the Seller ID and sum the Success?.

df['Success?'] = df['Success?'].replace({'Yes':True, 'No': False})
df.groupby('Seller ID')['Success?'].sum()

Upvotes: 1

r.burak
r.burak

Reputation: 544

df.loc[( df['Success?'] == 'Yes' )  , 'Succ'] = 1

df['sumofprevios'] = ( df.groupby('Seller ID')['Succ'].apply(lambda x: x.shift().expanding().sum()))

# you can drop Succ column now. 

Please let me know if this is not what you need.

Upvotes: 1

Epsi95
Epsi95

Reputation: 9047

You can apply filter, then use groupby and count

df[df['Success'] == 'Yes'].groupby(['Seller ID'])['Success'].count()
Seller ID
001    1
002    1

Upvotes: 1

Related Questions