gabriel11
gabriel11

Reputation: 90

Join two dataframes by the closest datetime

I have two dataframes df_A and df_B where each has date, time and a value. An example below:

import pandas as pd
df_A = pd.DataFrame({
    'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
    'time_A': ["22:00:00", "23:00:00", "00:00:00"],
    'val_A': [100, 200, 300]})

df_B = pd.DataFrame({
    'date_B': ["2021-02-01", "2021-02-01", "2021-02-01", "2021-02-01", "2021-02-02"],
    'time_B': ["22:01:12", "22:59:34", "23:00:17", "23:59:57", "00:00:11"],
    'val_B': [104, 203, 195, 296, 294]})

I need to join this dataframes but date and time never match. So I want a left join by the closest datetime from df_B to df_A. So the output should be:

df_out =  pd.DataFrame({
    'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
    'time_A': ["22:00:00", "23:00:00", "00:00:00"],
    'val_A': [100, 200, 300],
    'date_B': ["2021-02-01", "2021-02-01", "2021-02-01"],
    'time_B': ["22:01:12", "23:00:17", "23:59:57"],
    'val_B': [104, 195, 296]})

df_out

df_out

Upvotes: 1

Views: 354

Answers (1)

AlexK
AlexK

Reputation: 3011

Pandas has a handy merge_asof() function for these types of problems (https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html)

It requires a single key to merge on, so you can create a single date-time column in each dataframe and perform the merge:

df_A['date_time'] = pd.to_datetime(df_A.date_A + " " + df_A.time_A)
df_B['date_time'] = pd.to_datetime(df_B.date_B + " " + df_B.time_B)

# Sort the two dataframes by the new key, as required by merge_asof function
df_A.sort_values(by="date_time", inplace=True, ignore_index=True)
df_B.sort_values(by="date_time", inplace=True, ignore_index=True)

result_df = pd.merge_asof(df_A, df_B, on="date_time", direction="nearest")

Note the direction argument's value is "nearest" as you requested. There are other values you can choose, like "backward" and "forward".

Upvotes: 3

Related Questions