Reputation: 75
Consider a dataframe having 4 columns-
So I am looking to identify rows which are similar on Currency Pair , Notional Quantity, but opposite trade types - Buy and Sell
import pandas as pd
trade_id=[1,2,3,4,5,6] #dtype = int64
ccy_pairs=['AUD','AUD','GBP','EUR','NZD','NZD']#dtype = str
notional=[1,1,1.5,2,6,7]#dtype = int64
trade_type=['buy','sell','buy','sell','buy','buy']#dtype = str
value_date=['01012018', '03012019', '05062018','03062018','07082018','09082020']#dtype = datetime
df=pd.DataFrame() #dataframe comprising of many other columns
df['trade_id']=trade_id
df['ccy_pairs']=ccy_pairs
df['notional']=notional
df['trade_type']=trade_type
df['value_date']=value_date
#Output expected - Looking to highlight the offsetting legs of the trade ( i.e. trades having same notional and ccy pair,
#but different trade types )
Trade Id|CCY Pair|Notional|Trade_type|value_date
1 aud 1 Buy 01012018
3 gbp 1.5 Buy 05062018
4 eur 2 Sell 07062018
5 nzd 6 Buy 07082018
6 nzd 7 Buy 09092020
This means that 2 rows which matched on CCY an Notional but had opposing legs (Buy and Sell) resulted in one of them (either) getting dropped
Upvotes: 2
Views: 111
Reputation: 2668
Condition of two rows to drop (one of them):
"(duplicated rows in ccy_pairs
AND notional
) AND (not duplicated in trade_type
)"
drop_duplicates
won't check opposing legs (buy and sell).
You can try this (I assume to always drop the second find (.index[1]
)):
dups = df.ccy_pairs[df.ccy_pairs.duplicated()] # to get AUD and NZD
for i in dups: # to check opposing legs
if df.trade_type[df.ccy_pairs == i].nunique() == 2:
df.drop(df[df.ccy_pairs == i].index[1], inplace=True)
Upvotes: 1
Reputation: 13426
You need:
df.drop_duplicates(subset=['ccy_pairs','notional'], keep='first', inplace=True)
output
trade_id ccy_pairs notional trade_type value_date
0 1 AUD 1.0 buy 01012018
2 3 GBP 1.5 buy 05062018
3 4 EUR 2.0 sell 03062018
4 5 NZD 6.0 buy 07082018
5 6 NZD 7.0 buy 09082020
For more detail refer this
Upvotes: 1