pratyada
pratyada

Reputation: 79

Merge Dataframe in Python with a column and time difference value of 1 minute

I am new to python, wanted a help to merge two datafames on :

a) Should have same value in a numeric field df1_UL_GTP_TEID_0_int and df2_TEID_UL_int

b) difference between df1 START_TIME_roundoff and df2 TS_START_roundoff should be 1 minute.

df1
START_TIME_roundoff UL_GTP_TEID_0_int   TIMSI/MSIN
46  2020-03-10 12:00:00 1196907781  3.240371e+09
190 2020-03-10 12:01:00 1147678181  3.244522e+09
308 2020-03-10 12:05:00 1147678181  3.244522e+09
496 2020-03-10 12:07:00 1691830165  3.252351e+09
632 2020-03-10 12:12:00 1809929829  3.237458e+09

df2
S_START_roundoff    TEID_UL_int DIR
1   2020-10-03 09:59:00 1973380469  1
2   2020-10-03 10:00:00 2041336357  2
4   2020-10-03 12:06:00 1147678181  12
5   2020-10-03 09:57:00 1295205669  1
6   2020-10-03 12:12:00 1809929829  13

<<< Expected OUTPUT >>> 
row 308 of df_1 should merge with row 4 of df_2
and row 632 of df_1 should merge with row 6 of df_2

df_1 dataframe df_2 dataframe

logic :

If absolute value of |df1.START_TIME_roundoff - df2.TS_START_roundoff| <= 1 then 
df_new = pd.merge(df_1, df_2, how='inner', left_on='UL_GTP_TEID_0_int', right_on='TEID_UL_int')

thanks in advance,

Upvotes: 0

Views: 95

Answers (1)

Zolzaya Luvsandorj
Zolzaya Luvsandorj

Reputation: 635

Using the dataframes below:

df1= pd.DataFrame({'A': [1, 2, 3],
                   'B': ['2019-01-01 10:00:00', '2019-01-02 12:20:00', '2019-01-01 10:00:00'],
                   'C': ['ID1', 'ID2', 'ID3']})
df2= pd.DataFrame({'D': ['D1', 'D2', 'D3'],
                   'E': ['2019-01-01 10:00:59', '2019-01-02 12:21:20', '2019-01-01 09:59:30'],
                   'F': ['ID1', 'ID2', 'ID3']})

If I understood your condition to join two datasets correctly, below code seems to do the job:

df = pd.merge(df1, df2, left_on='C', right_on='F', how='inner')
df['B'] = pd.to_datetime(df['B']) # Ensure it's datetime
df['E'] = pd.to_datetime(df['E']) # Ensure it's datetime
df['delta']=abs(df['B'] - df['E'])/np.timedelta64(1,'m') # Find the abs diff in minutes
df.query("delta<1").drop(columns='delta')

Output is merged dataframe containing ID1 and ID3.

This is probably not the perfect solution as probably could be done in single line of code by someone who is more experienced.

Upvotes: 1

Related Questions