Hardeep
Hardeep

Reputation: 1

Find Matching rows in the data frame by comparing all rows based on certain conditions

I'm fairly new to python and would appreciate if someone can guide me in the right direction.

I have a dataset that has unique trades in each row. I need to find all rows that match on certain conditions. Basically, find any offsetting trades that fit a certain condition. For example:

Find trades that have the same REF_RATE, RECEIVE is within a difference of 5, MATURITY_DATE is with 7 days of each other. I have attached the image of data.

Thank You.

enter image description here

Upvotes: 0

Views: 51

Answers (1)

think-maths
think-maths

Reputation: 967

You can use groupby to achieve this. As per you requirement specific to this ask Find trades that have the same REF_RATE, RECEIVE is within a difference of 5, MATURITY_DATE is with 7 days of each other you can proceed like this.

#sample data created from the image of your dataset

>>> data = {'Maturity_Date':['2/01/2021','10/01/2021','10/01/2021','6/06/2021'],'Trade_id':['10484','12880','11798','19561'],'REF_RATE':['BBSW','BBSW','OIS','BBSW'],'Recive':[1.5,1.25,2,10]}
>>> df = pd.DataFrame(data)
>>> df
  Maturity_Date Trade_id REF_RATE  Recive
0     2/01/2021    10484     BBSW    1.50
1    10/01/2021    12880     BBSW    1.25
2    10/01/2021    11798      OIS    2.00
3     6/06/2021    19561     BBSW   10.00

#convert Maturity_Date to datetime format and sort REF_RATE by date if needed

>>> df['Maturity_Date'] =  pd.to_datetime(df['Maturity_Date'], dayfirst=True)
>>> df['Maturity_Date'] = df.groupby('REF_RATE')['Maturity_Date'].apply(lambda x: x.sort_values()) #if needed
>>> df
  Maturity_Date Trade_id REF_RATE  Recive
0    2021-01-02    10484     BBSW    1.50
1    2021-01-10    12880     BBSW    1.25
2    2021-01-10    11798      OIS    2.00
3    2021-06-06    19561     BBSW   10.00

#groupby of REF_RATE and apply condition on date and receive column

>>> df['date_diff>7'] = df.groupby('REF_RATE')['Maturity_Date'].diff() / np.timedelta64(1, 'D') > 7
>>> df['rate_diff>5'] = df.groupby('REF_RATE')['Recive'].diff() > 5
>>> df
  Maturity_Date Trade_id REF_RATE  Recive  date_diff>7  rate_diff>5
0    2021-01-02    10484     BBSW    1.50        False        False
1    2021-01-10    12880     BBSW    1.25         True        False #date_diff true as for BBSW Maturity date is more than 7
2    2021-01-10    11798      OIS    2.00        False        False
3    2021-06-06    19561     BBSW   10.00         True         True #rate_diff and date_diff true because date>7 and receive difference>5

Upvotes: 1

Related Questions