Reputation: 75
I have two data frames.
ID Date
1 1999-01-01
1 2000-01-01
1 2001-01-01
2 1999-01-01
2 2000-01-01
3 1998-01-01
3 1999-01-01
4 2000-01-01
and
ID Date
1 1999-03-01
2 2000-01-01
3 1998-01-01
My goal is to subset the first data frame by ID in the second data frame. Then in each subgroup of ID, I only select the Date that is no earlier than the Date in the second data frame. For example, I only select the second and third row of ID 1 because their Dates are not earlier than 1999-03-01 in the second data frame. My desired outcome is
ID Date
1 2000-01-01
1 2001-01-01
2 2000-01-01
3 1998-01-01
3 1999-01-01
Upvotes: 0
Views: 48
Reputation: 28685
If your data frames are data tables you can use the code below to join the two tables on matching ID
s and Date
of df1
>= Date
of df2
, which will return the result directly without requiring an extra filter step.
library(data.table)
df1[df2, on = .(ID, Date >= Date), .(ID, Date = x.Date)]
# ID Date
# 1: 1 2000-01-01
# 2: 1 2001-01-01
# 3: 2 2000-01-01
# 4: 3 1998-01-01
# 5: 3 1999-01-01
Data used
df1 <- fread('
ID Date
1 1999-01-01
1 2000-01-01
1 2001-01-01
2 1999-01-01
2 2000-01-01
3 1998-01-01
3 1999-01-01
4 2000-01-01
')
df1[, Date := as.Date(Date)]
df2 <- fread('
ID Date
1 1999-03-01
2 2000-01-01
3 1998-01-01
')
df2[, Date := as.Date(Date)]
Upvotes: 2
Reputation: 178
You can do this using the dplyr
package by joining the second data frame onto the first and using a filter. If the columns have the same name, dplyr
will add .x and .y onto the column names, so your code would look something like this.
df1 %>% inner_join(df2, by = 'ID') %>% filter(Date.x >= Date.y)
Upvotes: 3