Reputation: 25
I have a data frame with attendance data from a Zoom event containing email addresses, join time, and leave time. Many attendees log in, log out, and then log back in, and are therefore represented in multiple rows. I want to calculate how many total minutes attendees were logged in. In inspecting the data, I noticed one person where they have overlapping time intervals (see email3 in the example below), and I want to be able to identify any others in the dataset where this is the case.
Here is an example data frame already with the desired new column "overlap":
structure(list(Email= c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]"), Join.Time = structure(c(as.POSIXct("2020-12-09 13:04:00"),
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"),as.POSIXct("2020-12-09 13:07:00"),
as.POSIXct("2020-12-09 13:46:00"),as.POSIXct("2020-12-09 13:29:00")), class = c("POSIXct", "POSIXt"),
tzone = ""), Leave.Time = structure(c(as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"
), row.names = c(NA, -6L), class = "data.frame")
Email Join.Time Leave.Time Overlap
1 [email protected] 2020-12-09 13:04:00 2020-12-09 13:25:00 FALSE
2 [email protected] 2020-12-09 13:20:00 2020-12-09 13:22:00 FALSE
3 [email protected] 2020-12-09 13:30:00 2020-12-09 14:01:00 FALSE
4 [email protected] 2020-12-09 13:07:00 2020-12-09 13:29:00 TRUE
5 [email protected] 2020-12-09 13:46:00 2020-12-09 14:00:00 TRUE
6 [email protected] 2020-12-09 13:29:00 2020-12-09 14:33:00 TRUE
I tried to solution suggested here: R Find overlap among time periods but when I do I get the error "Error in if (int_overlaps(intervals[i], intervals[j])) { : missing value where TRUE/FALSE needed"
Would appreciate any help!!
Upvotes: 2
Views: 300
Reputation: 7540
Although an older question, here's a newer option using the IV package dedicated to working with intervals:
library(tidyverse)
library(ivs)
# Provided input data
df <- structure(list(
Email = c(
"[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]"
), Join.Time = structure(c(
as.POSIXct("2020-12-09 13:04:00"),
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"), as.POSIXct("2020-12-09 13:07:00"),
as.POSIXct("2020-12-09 13:46:00"), as.POSIXct("2020-12-09 13:29:00")
),
class = c("POSIXct", "POSIXt"),
tzone = ""
), Leave.Time = structure(c(
as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")
), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)
), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"), row.names = c(NA, -6L), class = "data.frame")
# Check for overlaps
df |>
mutate(iv = iv(Join.Time, Leave.Time)) |>
group_by(Email) |>
mutate(iv = iv_groups(iv)) |>
ungroup() |>
add_count(iv) |>
mutate(overlap = if_else(n > 1, TRUE, FALSE))
#> # A tibble: 6 × 7
#> Email Join.Time Leave.Time Overlap
#> <chr> <dttm> <dttm> <lgl>
#> 1 [email protected] 2020-12-09 13:04:00 2020-12-09 08:25:00 FALSE
#> 2 [email protected] 2020-12-09 13:20:00 2020-12-09 08:22:00 FALSE
#> 3 [email protected] 2020-12-09 13:30:00 2020-12-09 09:01:00 FALSE
#> 4 [email protected] 2020-12-09 13:07:00 2020-12-09 08:29:00 TRUE
#> 5 [email protected] 2020-12-09 13:46:00 2020-12-09 09:00:00 TRUE
#> 6 [email protected] 2020-12-09 13:29:00 2020-12-09 09:33:00 TRUE
#> # … with 3 more variables: iv <iv<dttm>>, n <int>, overlap <lgl>
Created on 2022-05-27 by the reprex package (v2.0.1)
Upvotes: 0
Reputation: 324
Another option from the thread you mentioned separately counts the overlapping values and adds them in as a separate column in a separate dataframe. Try this. It worked for me. I get the same output you provided.
library(data.frame)
dt <- data.table(df, key=c("Join.Time", "Leave.Time"))[, `:=`(Overlap=NULL, row=1:nrow(df))]
overlapping <- unique(foverlaps(dt, dt)[Email==i.Email & row!=i.row, Email])
dt[, `:=`(Overlap=FALSE, row=NULL)][Email %in% overlapping, Overlap:=TRUE][order(Email, Join.Time)]
Upvotes: 1