Reputation: 173
I have two data frames in R.
df1 looks like this:
id time
1 2018-08-28 11:22:40
2 2018-08-28 11:35:10
3 2018-08-28 11:50:00
4 2018-08-28 11:55:30
df2 looks like this:
start_time end_time
2018-08-28 11:22:00 2018-08-28 11:22:50
2018-08-28 11:30:30 2018-08-28 11:34:10
2018-08-28 11:49:00 2018-08-28 11:52:20
2018-08-28 11:57:20 2018-08-28 11:59:40
I'm trying to select the rows from the df1 that fall between any of the start_time and end_time pairs in df2. In the example above that would leave me with:
id time
1 2018-08-28 11:22:40
3 2018-08-28 11:50:00
This problem is similar to that found here but in R instead of SQL. How do I achieve this?
Upvotes: 0
Views: 195
Reputation: 50668
Here is an option using fuzzyjoin
library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(
df1 %>% mutate(time = as.POSIXct(time)),
df2 %>% mutate(
start_time = as.POSIXct(start_time),
end_time = as.POSIXct(end_time)),
by = c("time" = "start_time", "time" = "end_time"),
match_fun = list(`>=`, `<=`)) %>%
filter(!is.na(start_time)) %>%
select(id, time)
# id time
#1 1 2018-08-28 11:22:40
#2 3 2018-08-28 11:50:00
Explanation: Interval join df1
and df2
(where time >= start_time
& time <= end_time
), then only select rows with no NA
s in start_time
(as those are the entries that lie within the start_time
-end_time
interval).
df1 <- read.table(text =
"id time
1 '2018-08-28 11:22:40'
2 '2018-08-28 11:35:10'
3 '2018-08-28 11:50:00'
4 '2018-08-28 11:55:30'", header = T)
df2 <- read.table(text =
"start_time end_time
'2018-08-28 11:22:00' '2018-08-28 11:22:50'
'2018-08-28 11:30:30' '2018-08-28 11:34:10'
'2018-08-28 11:49:00' '2018-08-28 11:52:20'
'2018-08-28 11:57:20' '2018-08-28 11:59:40'", header = T)
Upvotes: 2