Reputation: 1105
So I have a dataframe like this
DF1
ID timestamp value
1 2018-07-31 20:00:00 45555
1 2018-08-01 06:50:00 333
1 2018-08-01 07:00:00 322
2 2018-08-01 13:00:00 1222
2 2018-08-01 22:00:00 1111
3 2018-08-01 11:00:00 0
3 2018-08-02 08:00:00 22222
And another one like this
DF2
startTime endTime ID
2018-08-01 06:45:00 2018-08-01 09:49:00 1
2019-12-02 11:01:00 2019-12-02 11:02:00 2
2018-08-02 07:59:00 2018-08-02 08:50:00 2
2018-08-02 07:59:00 2018-08-02 08:50:00 3
So I want to create a new column using mutate
which mutates DF2 that tells me if any time in between the startTime
and endTime
appears in DF1 that matches the ID.
So in DF2 if we have a date like 8/2/2018 and the time span is 7:59AM - 8:50AM , if any timestamp within this interval inclusive (2018-08-02 7:59:00, 2018-08-02 8:00:00, 2018-08-02 8:01:00...) appears in DF1 then a variable in DF2 reads 1, else 0. This is only the case if the ID
matches. Does not matter how many times, atleast once is what matters.
Final version of DF2 should look like.
startTime endTime ID match
2018-08-01 06:45:00 2018-08-01 09:49:00 1 1
2019-12-02 11:01:00 2019-12-02 11:02:00 2 0
2018-08-02 07:59:00 2018-08-02 08:50:00 2 0
2018-08-02 07:59:00 2018-08-02 08:50:00 3 1
Ideally I would like to use BETWEEN()
and remain in the tidyverse.
Upvotes: 3
Views: 408
Reputation: 1428
Per your request for a tidyverse solution, you can also do this by first joining the two dfs by 'ID', then using lubridate's %within%
and interval
(%--%
) functions to find matches:
library(dplyr)
library(lubridate)
df_both <- left_join(DF2, DF1, by = "ID")
df_both %>%
group_by(ID, startTime, endTime) %>%
summarize(match = any(timestamp %within% (startTime %--% endTime)),
.groups = "drop")
#> # A tibble: 4 x 4
#> ID startTime endTime match
#> <int> <dttm> <dttm> <lgl>
#> 1 1 2018-08-01 06:45:00 2018-08-01 09:49:00 TRUE
#> 2 2 2018-08-02 07:59:00 2018-08-02 08:50:00 FALSE
#> 3 2 2019-12-02 12:01:00 2019-12-02 12:02:00 FALSE
#> 4 3 2018-08-02 07:59:00 2018-08-02 08:50:00 TRUE
If you prefer 1/0 instead of TRUE/FALSE, you can use as.integer
to convert the 'match' column as such.
Upvotes: 1
Reputation: 886948
We could use a data.table
non-equi join on the DateTime columns along with 'ID' column, create the match
by checking if the number of rows (.N
) is greater than 0
library(data.table)
DF2$match <- 0L
setDT(DF2)[DF1, match := +(.N > 0),on = .(ID,
startTime < timestamp, endTime > timestamp), by = .EACHI]
DF2
# startTime endTime ID match
#1: 2018-08-01 06:45:00 2018-08-01 09:49:00 1 1
#2: 2019-12-02 11:01:00 2019-12-02 11:02:00 2 0
#3: 2018-08-02 07:59:00 2018-08-02 08:50:00 2 0
#4: 2018-08-02 07:59:00 2018-08-02 08:50:00 3 1
DF1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
timestamp = structure(c(1533085200,
1533124200, 1533124800, 1533146400, 1533178800, 1533139200, 1533214800
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(45555L,
333L, 322L, 1222L, 1111L, 0L, 22222L)), row.names = c(NA, -7L
), class = "data.frame")
DF2 <- structure(list(startTime = structure(c(1533123900, 1575306060,
1533214740, 1533214740), class = c("POSIXct", "POSIXt"), tzone = ""),
endTime = structure(c(1533134940, 1575306120, 1533217800,
1533217800), class = c("POSIXct", "POSIXt"), tzone = ""),
ID = c(1L, 2L, 2L, 3L)), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 3