Reputation: 1470
I am trying to remove rows from a pandas DataFrame with multiple criteria.
The criteria is if Trader_x buys and sells on the same date, remove both the buy and sell rows for that date.
This is complex because it requires the trader and date to be unique for the lookup. I do not want to remove rows if only the date matches buy and sell. It has to be trader_x and date dependant.
I have tried many methods including pivot tables and merges but I cannot get anything close to working.
The input data looks like this:
Trader_ID Date Type
Trader_1 5/4/11 Sell
Trader_1 13/4/11 Sell
Trader_1 4/4/12 Sell
Trader_1 3/4/13 Sell
Trader_1 2/5/13 Sell
Trader_1 9/7/13 Sell
Trader_1 5/5/14 Buy
Trader_1 5/5/14 Sell
Trader_1 5/5/15 Buy
Trader_2 5/5/15 Sell
Trader_2 15/3/13 Buy
Trader_2 15/3/13 Sell
Trader_2 1/4/13 Buy
Trader_2 1/4/13 Sell
Trader_2 15/4/13 Buy
Trader_2 15/4/13 Sell
Trader_2 29/4/13 Buy
Trader_2 30/4/13 Sell
Trader_2 22/5/13 Sell
Example DataFrame below if anyone is willing to try.
data = {'trader_id': ['Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1',
'Trader_1','Trader_1','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2',
'Trader_2','Trader_2','Trader_2','Trader_2'],
'date': ['1/4/13', '13/4/11', '4/4/12', '3/4/13', '2/5/13', '9/7/13', '5/5/14', '5/5/14', '5/5/15',
'5/5/15', '15/3/13', '15/3/13', '1/4/13', '1/4/13', '15/4/13', '15/4/13', '29/4/13', '30/4/13', '22/5/13'],
'type': ['Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy',
'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Sell']}
df = pd.DataFrame(data)
The (example) dataset has a total input of 19 rows. The output should only have 11 rows.
The output data should look like this:
Trader_ID Date Type
Trader_1 5/4/11 Sell
Trader_1 13/4/11 Sell
Trader_1 4/4/12 Sell
Trader_1 3/4/13 Sell
Trader_1 2/5/13 Sell
Trader_1 9/7/13 Sell
Trader_1 5/5/15 Buy
Trader_2 5/5/15 Sell
Trader_2 29/4/13 Buy
Trader_2 30/4/13 Sell
Trader_2 22/5/13 Sell
Would love some guidance on this one - I'm stuck. Thanks.
Upvotes: 2
Views: 483
Reputation: 294258
It seems you never have a single trader with multiple sells or buys in a single day. So, we can filter on number of trades being equal to 1
df.groupby(['trader_id', 'date']).filter(lambda x: len(x) == 1)
date trader_id type
0 1/4/13 Trader_1 Sell
1 13/4/11 Trader_1 Sell
2 4/4/12 Trader_1 Sell
3 3/4/13 Trader_1 Sell
4 2/5/13 Trader_1 Sell
5 9/7/13 Trader_1 Sell
8 5/5/15 Trader_1 Buy
9 5/5/15 Trader_2 Sell
16 29/4/13 Trader_2 Buy
17 30/4/13 Trader_2 Sell
18 22/5/13 Trader_2 Sell
Upvotes: 1
Reputation: 214957
Your logic can be boiled down to keeping date for a trader if it contains only one type of trading, either Buy or Sell, you could use groupby.filter
:
df.groupby(['date', 'trader_id']).filter(lambda g: (g.type == "Sell").all() or (g.type == "Buy").all())
Note the data used is what is provided in the dictionary which is slightly different from the data in the code block.
Upvotes: 3