Reputation: 79
I am new to python, wanted a help to merge two datafames on :
a) Should have same value in a numeric field df1_UL_GTP_TEID_0_int
and df2_TEID_UL_int
b) difference between df1 START_TIME_roundoff
and df2 TS_START_roundoff
should be 1 minute.
df1
START_TIME_roundoff UL_GTP_TEID_0_int TIMSI/MSIN
46 2020-03-10 12:00:00 1196907781 3.240371e+09
190 2020-03-10 12:01:00 1147678181 3.244522e+09
308 2020-03-10 12:05:00 1147678181 3.244522e+09
496 2020-03-10 12:07:00 1691830165 3.252351e+09
632 2020-03-10 12:12:00 1809929829 3.237458e+09
df2
S_START_roundoff TEID_UL_int DIR
1 2020-10-03 09:59:00 1973380469 1
2 2020-10-03 10:00:00 2041336357 2
4 2020-10-03 12:06:00 1147678181 12
5 2020-10-03 09:57:00 1295205669 1
6 2020-10-03 12:12:00 1809929829 13
<<< Expected OUTPUT >>>
row 308 of df_1 should merge with row 4 of df_2
and row 632 of df_1 should merge with row 6 of df_2
logic :
If absolute value of |df1.START_TIME_roundoff - df2.TS_START_roundoff| <= 1 then
df_new = pd.merge(df_1, df_2, how='inner', left_on='UL_GTP_TEID_0_int', right_on='TEID_UL_int')
thanks in advance,
Upvotes: 0
Views: 95
Reputation: 635
Using the dataframes below:
df1= pd.DataFrame({'A': [1, 2, 3],
'B': ['2019-01-01 10:00:00', '2019-01-02 12:20:00', '2019-01-01 10:00:00'],
'C': ['ID1', 'ID2', 'ID3']})
df2= pd.DataFrame({'D': ['D1', 'D2', 'D3'],
'E': ['2019-01-01 10:00:59', '2019-01-02 12:21:20', '2019-01-01 09:59:30'],
'F': ['ID1', 'ID2', 'ID3']})
If I understood your condition to join two datasets correctly, below code seems to do the job:
df = pd.merge(df1, df2, left_on='C', right_on='F', how='inner')
df['B'] = pd.to_datetime(df['B']) # Ensure it's datetime
df['E'] = pd.to_datetime(df['E']) # Ensure it's datetime
df['delta']=abs(df['B'] - df['E'])/np.timedelta64(1,'m') # Find the abs diff in minutes
df.query("delta<1").drop(columns='delta')
Output is merged dataframe containing ID1 and ID3.
This is probably not the perfect solution as probably could be done in single line of code by someone who is more experienced.
Upvotes: 1