Reputation: 11
I'm working in pandas and have a dataframe of sales information with a TradeID, Seller ID,and a Payment date, and I want to calculate for each Seller ID the number of previous succesful Trades, before the current trade date, that the seller has had. I'm fairly sure that it's a split apply combine style solution but I can't figure out how to achieve it.
The data looks a bit like this
|---------|-----------|-------------|------------|
|Trade ID | Seller ID | Payment Date|Success? |
|---------|-----------|-------------|------------|
|001 |001 |08/12/2016 |Yes |
|002 |001 |07/12/2016 |Yes |
|003 |001 |12/12/2016 |No |
|004 |001 |13/12/2016 |Yes |
|005 |001 |14/12/2016 |Null |
|005 |002 |18/01/2012 |Yes |
|006 |002 |01/01/2011 |Null |
|007 |003 |07/02/2010 |Yes |
and I'm looking for an output that would be something like:
|---------|-----------|-------------|------------|--------------------------|
|Trade ID | Seller ID | Payment Date|Success? |Previous Succesful Trades |
|---------|-----------|-------------|------------|--------------------------|
|001 |001 |08/12/2016 |Yes |1 |
|002 |001 |07/12/2016 |Yes |0 |
|003 |001 |12/12/2016 |No |2 |
|004 |001 |13/12/2016 |Yes |2 |
|005 |001 |14/12/2016 |Null |3 |
|005 |002 |18/01/2012 |Yes |1 |
|006 |002 |01/01/2011 |Null |0 |
|007 |003 |07/02/2010 |Yes |0 |
I can aggregate the successful trades for each seller, but this does not take in to account the fact that at the point the seller makes the trade they will not have completed all the trades in the dataframe. For example on 12/12/2016, seller 1 will not have completed the trade on 13/12/2016 therefore their previous successfully completed trades is 2 and not 3.
Any help greatly appreciated !
Upvotes: 1
Views: 88
Reputation: 33
The pandas documentation is quite helpful for figuring these solutions! Below is one possibility that will also allow for recording unsuccessful trades.
df = pd.DataFrame(
{"Trade ID":[1,2,3,4,5,6],
"Seller ID":[1,1,2,2,2,2],
"Payment Date":["2016-12=08","2016-12-12","2012-01-18","2016-02=08","2016-12-16","2012-01-31"],
"Success?":["Yes","No", "Yes", "Yes","No", "Yes"]})
Then group on Seller ID
grouped_seller = df.groupby("Seller ID")
Then you can do multiple different things, but value counts will count how many times YES or No is for each seller ID
grouped_seller["Success?"].value_counts()
Seller ID Success?
1 No 1
Yes 1
2 Yes 3
No 1
Name: Success?, dtype: int64
You could then access the seller id using .loc[]
grouped_seller["Success?"].value_counts().loc[2]
Success?
Yes 3
No 1
Name: Success?, dtype: int64
N.B. the dataframe I create will not have the same dtypes as yours unless you convert dates to datetime
Upvotes: 1
Reputation: 1738
The easiest way I can think of is to first of all replace the 'Yes' and 'No' with True and False, then groupby the Seller ID
and sum the Success?
.
df['Success?'] = df['Success?'].replace({'Yes':True, 'No': False})
df.groupby('Seller ID')['Success?'].sum()
Upvotes: 1
Reputation: 544
df.loc[( df['Success?'] == 'Yes' ) , 'Succ'] = 1
df['sumofprevios'] = ( df.groupby('Seller ID')['Succ'].apply(lambda x: x.shift().expanding().sum()))
# you can drop Succ column now.
Please let me know if this is not what you need.
Upvotes: 1
Reputation: 9047
You can apply filter, then use groupby
and count
df[df['Success'] == 'Yes'].groupby(['Seller ID'])['Success'].count()
Seller ID
001 1
002 1
Upvotes: 1