Reputation: 59
I have two dataframes, and both of them are indexed by datetime. for example, the dataframe 1 is something below:
|date | value |
+----------------+--------+
|2021-11-11 09:00| 1 |
|2021-11-11 10:00| 1 |
|2021-11-12 11:00| 2 |
|2021-11-14 09:00| 2 |
|2021-11-15 09:30| 3 |
and the dataframe 2 looks like:
|date | value |
+----------------+--------+
|2021-11-10 11:00| 2 |
|2021-11-11 09:30| 3 |
|2021-11-12 12:00| 4 |
|2021-11-13 09:50| 2 |
|2021-11-15 10:30| 3 |
For each entry in dataframe 1, I want to find the most recent one entry in dataframe 2, and create a new column in dataframe 1 to setup the relationship between the two dataframes.
To make it more clearly, the expected results should look like below.
|date | value | df2_index |
+----------------+--------+----------------|
|2021-11-11 09:00| 1 |2021-11-10 11:00|
|2021-11-11 10:00| 1 |2021-11-11 09:30|
|2021-11-12 11:00| 2 |2021-11-11 09:30|
|2021-11-14 09:00| 2 |2021-11-13 09:50|
|2021-11-15 09:30| 3 |2021-11-13 09:50|
For the first entry in dataframe 1, 2021-11-11 09:00
's most recent one is 2021-11-10 11:00
, and the third entry 2021-11-12 11:00
's most recent one which means the largest timestamp that do not exceed 2021-11-12 11:00
in dataframe 2 is the 2021-11-11 09:30
.
Is there any pandas method that could implement this function efficiently?
Great thanks.
Upvotes: 1
Views: 40
Reputation: 28644
pandas merge_asof
should suffice :
pd.merge_asof(df1, df2.assign(df2_index = df2.date), on = 'date')
date value_x value_y df2_index
0 2021-11-11 09:00:00 1 2 2021-11-10 11:00:00
1 2021-11-11 10:00:00 1 3 2021-11-11 09:30:00
2 2021-11-12 11:00:00 2 3 2021-11-11 09:30:00
3 2021-11-14 09:00:00 2 2 2021-11-13 09:50:00
4 2021-11-15 09:30:00 3 2 2021-11-13 09:50:00
Upvotes: 1