Reputation: 3
So I have two dataframes, "df1" and "df2" (see code below). I want to create a new variable in "df1" by iterating trough the elDate variable in "df2", and picking the closest elDate value above the date value in "df1".
For example, the first row in "df1" is Albania with the date 1996-12-24. I want the value in the new variable to be 1997-06-29, which is the closest elDate in "df2" above date in "df1".
The main thing is to stop at the first elDate which is above date.
library(tidyverse)
library(lubridate)
df1 <- tibble(country = c(rep("Albania", 11), rep("Algeria", 7)),
date = ymd("1996-12-24", "1997-01-30", "2009-07-30", "2011-07-08", "2012-04-18", "2012-01-20", "2013-05-16", "2016-03-03", "2017-05-11", "2018-09-07", "2022-05-31", "2005-01-10", "2006-07-12", "2012-10-09", "2012-11-15", "2014-04-18", "2017-06-07", "2017-01-24"))
df2 <- tibble(country = c(rep("Albania", 9), rep("Algeria", 6)),
elDate = ymd("2025-04-25", "2021-04-25", "2017-06-25", "2013-06-25", "2009-06-28", "2005-07-03", "2001-07-08", "1997-06-29", "1996-05-26", "2021-06-12", "2017-05-04", "2012-05-10", "2007-05-17", "2007-05-30", "2002-05-30"))
I want the new dataframe to look like this:
country | date | elDate |
---|---|---|
Albania | 24-12-1996 | 29-06-1997 |
Albania | 30-01-1997 | 29-06-1997 |
Albania | 30-07-2009 | 25-06-2013 |
Albania | 08-07-2011 | 25-06-2013 |
Albania | 18-04-2012 | 25-06-2013 |
Albania | 20-01-2012 | 25-06-2013 |
Albania | 16-05-2013 | 25-06-2013 |
Albania | 03-03-2016 | 25-06-2017 |
Albania | 11-05-2017 | 25-06-2017 |
Albania | 07-09-2018 | 25-04-2021 |
Albania | 31-05-2022 | 24-04-2025 |
Algeria | 10-01-2005 | 30-05-2007 |
Algeria | 12-07-2006 | 30-05-2007 |
Algeria | 09-10-2012 | 04-05-2017 |
Algeria | 15-11-2012 | 04-05-2017 |
Algeria | 18-04-2014 | 04-05-2017 |
Algeria | 07-06-2017 | 12-06-2021 |
Algeria | 24-01-2017 | 12-06-2021 |
Algeria | 07-06-2017 | 12-06-2021 |
Suggestions are much appreciated!
Thanks!
Upvotes: 0
Views: 39
Reputation: 51994
You can try inequality joins in dplyr
:
library(dplyr) #1.1.0 and above
df1 %>%
left_join(df2, by = join_by(country, closest(date <= elDate)))
# A tibble: 18 × 3
country date elDate
<chr> <date> <date>
1 Albania 1996-12-24 1997-06-29
2 Albania 1997-01-30 1997-06-29
3 Albania 2009-07-30 2013-06-25
4 Albania 2011-07-08 2013-06-25
5 Albania 2012-04-18 2013-06-25
6 Albania 2012-01-20 2013-06-25
7 Albania 2013-05-16 2013-06-25
8 Albania 2016-03-03 2017-06-25
9 Albania 2017-05-11 2017-06-25
10 Albania 2018-09-07 2021-04-25
11 Albania 2022-05-31 2025-04-25
12 Algeria 2005-01-10 2007-05-17
13 Algeria 2006-07-12 2007-05-17
14 Algeria 2012-10-09 2017-05-04
15 Algeria 2012-11-15 2017-05-04
16 Algeria 2014-04-18 2017-05-04
17 Algeria 2017-06-07 2021-06-12
18 Algeria 2017-01-24 2017-05-04
Upvotes: 1