mattblack
mattblack

Reputation: 1470

Remove rows from pandas DataFrame using multiple criteria

I am trying to remove rows from a pandas DataFrame with multiple criteria.

The criteria is if Trader_x buys and sells on the same date, remove both the buy and sell rows for that date.

This is complex because it requires the trader and date to be unique for the lookup. I do not want to remove rows if only the date matches buy and sell. It has to be trader_x and date dependant.

I have tried many methods including pivot tables and merges but I cannot get anything close to working.

The input data looks like this:

Trader_ID   Date    Type
Trader_1    5/4/11  Sell
Trader_1    13/4/11 Sell
Trader_1    4/4/12  Sell
Trader_1    3/4/13  Sell
Trader_1    2/5/13  Sell
Trader_1    9/7/13  Sell
Trader_1    5/5/14  Buy
Trader_1    5/5/14  Sell
Trader_1    5/5/15  Buy
Trader_2    5/5/15  Sell
Trader_2    15/3/13 Buy
Trader_2    15/3/13 Sell
Trader_2    1/4/13  Buy
Trader_2    1/4/13  Sell
Trader_2    15/4/13 Buy
Trader_2    15/4/13 Sell
Trader_2    29/4/13 Buy
Trader_2    30/4/13 Sell
Trader_2    22/5/13 Sell

Example DataFrame below if anyone is willing to try.

data = {'trader_id': ['Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1','Trader_1',
      'Trader_1','Trader_1','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2','Trader_2',
      'Trader_2','Trader_2','Trader_2','Trader_2'],
     'date': ['1/4/13', '13/4/11', '4/4/12', '3/4/13', '2/5/13', '9/7/13', '5/5/14', '5/5/14', '5/5/15',
      '5/5/15', '15/3/13', '15/3/13', '1/4/13', '1/4/13', '15/4/13', '15/4/13', '29/4/13', '30/4/13', '22/5/13'],
       'type': ['Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy', 
                'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Buy', 'Sell', 'Sell']}

df = pd.DataFrame(data)

The (example) dataset has a total input of 19 rows. The output should only have 11 rows.

The output data should look like this:

Trader_ID   Date    Type
Trader_1    5/4/11  Sell
Trader_1    13/4/11 Sell
Trader_1    4/4/12  Sell
Trader_1    3/4/13  Sell
Trader_1    2/5/13  Sell
Trader_1    9/7/13  Sell
Trader_1    5/5/15  Buy
Trader_2    5/5/15  Sell
Trader_2    29/4/13 Buy
Trader_2    30/4/13 Sell
Trader_2    22/5/13 Sell

Would love some guidance on this one - I'm stuck. Thanks.

Upvotes: 2

Views: 483

Answers (2)

piRSquared
piRSquared

Reputation: 294258

It seems you never have a single trader with multiple sells or buys in a single day. So, we can filter on number of trades being equal to 1

df.groupby(['trader_id', 'date']).filter(lambda x: len(x) == 1)

       date trader_id  type
0    1/4/13  Trader_1  Sell
1   13/4/11  Trader_1  Sell
2    4/4/12  Trader_1  Sell
3    3/4/13  Trader_1  Sell
4    2/5/13  Trader_1  Sell
5    9/7/13  Trader_1  Sell
8    5/5/15  Trader_1   Buy
9    5/5/15  Trader_2  Sell
16  29/4/13  Trader_2   Buy
17  30/4/13  Trader_2  Sell
18  22/5/13  Trader_2  Sell

Upvotes: 1

akuiper
akuiper

Reputation: 214957

Your logic can be boiled down to keeping date for a trader if it contains only one type of trading, either Buy or Sell, you could use groupby.filter:

df.groupby(['date', 'trader_id']).filter(lambda g: (g.type == "Sell").all() or (g.type == "Buy").all())

enter image description here

Note the data used is what is provided in the dictionary which is slightly different from the data in the code block.

Upvotes: 3

Related Questions