R Identify cases of overlap in time intervals within the same ID

Question

I have a data frame with attendance data from a Zoom event containing email addresses, join time, and leave time. Many attendees log in, log out, and then log back in, and are therefore represented in multiple rows. I want to calculate how many total minutes attendees were logged in. In inspecting the data, I noticed one person where they have overlapping time intervals (see email3 in the example below), and I want to be able to identify any others in the dataset where this is the case.

Here is an example data frame already with the desired new column "overlap":

structure(list(Email= c("email1@gmail.com", "email2@gmail.com", "email2@gmail.com", "email3@gmail.com",
"email3@gmail.com", "email3@gmail.com"), Join.Time = structure(c(as.POSIXct("2020-12-09 13:04:00"), 
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"),as.POSIXct("2020-12-09 13:07:00"), 
as.POSIXct("2020-12-09 13:46:00"),as.POSIXct("2020-12-09 13:29:00")), class = c("POSIXct", "POSIXt"), 
tzone = ""), Leave.Time = structure(c(as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"), 
    Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"
), row.names = c(NA, -6L), class = "data.frame")

             Email           Join.Time          Leave.Time Overlap
1 email1@gmail.com 2020-12-09 13:04:00 2020-12-09 13:25:00   FALSE
2 email2@gmail.com 2020-12-09 13:20:00 2020-12-09 13:22:00   FALSE
3 email2@gmail.com 2020-12-09 13:30:00 2020-12-09 14:01:00   FALSE
4 email3@gmail.com 2020-12-09 13:07:00 2020-12-09 13:29:00    TRUE
5 email3@gmail.com 2020-12-09 13:46:00 2020-12-09 14:00:00    TRUE
6 email3@gmail.com 2020-12-09 13:29:00 2020-12-09 14:33:00    TRUE

I tried to solution suggested here: R Find overlap among time periods but when I do I get the error "Error in if (int_overlaps(intervals[i], intervals[j])) { : missing value where TRUE/FALSE needed"

Would appreciate any help!!

AcidCatfish · Accepted Answer

Another option from the thread you mentioned separately counts the overlapping values and adds them in as a separate column in a separate dataframe. Try this. It worked for me. I get the same output you provided.

library(data.frame)
dt <- data.table(df, key=c("Join.Time", "Leave.Time"))[, `:=`(Overlap=NULL, row=1:nrow(df))]
overlapping <- unique(foverlaps(dt, dt)[Email==i.Email & row!=i.row, Email])
dt[, `:=`(Overlap=FALSE, row=NULL)][Email %in% overlapping, Overlap:=TRUE][order(Email, Join.Time)]

R Identify cases of overlap in time intervals within the same ID

Answers (2)

Related Questions