Reputation: 90
I have two dataframes df_A
and df_B
where each has date, time and a value. An example below:
import pandas as pd
df_A = pd.DataFrame({
'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
'time_A': ["22:00:00", "23:00:00", "00:00:00"],
'val_A': [100, 200, 300]})
df_B = pd.DataFrame({
'date_B': ["2021-02-01", "2021-02-01", "2021-02-01", "2021-02-01", "2021-02-02"],
'time_B': ["22:01:12", "22:59:34", "23:00:17", "23:59:57", "00:00:11"],
'val_B': [104, 203, 195, 296, 294]})
I need to join this dataframes but date and time never match. So I want a left join by the closest datetime from df_B
to df_A
. So the output should be:
df_out = pd.DataFrame({
'date_A': ["2021-02-01", "2021-02-01", "2021-02-02"],
'time_A': ["22:00:00", "23:00:00", "00:00:00"],
'val_A': [100, 200, 300],
'date_B': ["2021-02-01", "2021-02-01", "2021-02-01"],
'time_B': ["22:01:12", "23:00:17", "23:59:57"],
'val_B': [104, 195, 296]})
Upvotes: 1
Views: 354
Reputation: 3011
Pandas has a handy merge_asof()
function for these types of problems (https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html)
It requires a single key to merge on, so you can create a single date-time column in each dataframe and perform the merge:
df_A['date_time'] = pd.to_datetime(df_A.date_A + " " + df_A.time_A)
df_B['date_time'] = pd.to_datetime(df_B.date_B + " " + df_B.time_B)
# Sort the two dataframes by the new key, as required by merge_asof function
df_A.sort_values(by="date_time", inplace=True, ignore_index=True)
df_B.sort_values(by="date_time", inplace=True, ignore_index=True)
result_df = pd.merge_asof(df_A, df_B, on="date_time", direction="nearest")
Note the direction argument's value is "nearest" as you requested. There are other values you can choose, like "backward" and "forward".
Upvotes: 3