user90401
user90401

Reputation: 25

R Identify cases of overlap in time intervals within the same ID

I have a data frame with attendance data from a Zoom event containing email addresses, join time, and leave time. Many attendees log in, log out, and then log back in, and are therefore represented in multiple rows. I want to calculate how many total minutes attendees were logged in. In inspecting the data, I noticed one person where they have overlapping time intervals (see email3 in the example below), and I want to be able to identify any others in the dataset where this is the case.

Here is an example data frame already with the desired new column "overlap":

structure(list(Email= c("[email protected]", "[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]"), Join.Time = structure(c(as.POSIXct("2020-12-09 13:04:00"), 
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"),as.POSIXct("2020-12-09 13:07:00"), 
as.POSIXct("2020-12-09 13:46:00"),as.POSIXct("2020-12-09 13:29:00")), class = c("POSIXct", "POSIXt"), 
tzone = ""), Leave.Time = structure(c(as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"), 
    Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"
), row.names = c(NA, -6L), class = "data.frame")

             Email           Join.Time          Leave.Time Overlap
1 [email protected] 2020-12-09 13:04:00 2020-12-09 13:25:00   FALSE
2 [email protected] 2020-12-09 13:20:00 2020-12-09 13:22:00   FALSE
3 [email protected] 2020-12-09 13:30:00 2020-12-09 14:01:00   FALSE
4 [email protected] 2020-12-09 13:07:00 2020-12-09 13:29:00    TRUE
5 [email protected] 2020-12-09 13:46:00 2020-12-09 14:00:00    TRUE
6 [email protected] 2020-12-09 13:29:00 2020-12-09 14:33:00    TRUE

I tried to solution suggested here: R Find overlap among time periods but when I do I get the error "Error in if (int_overlaps(intervals[i], intervals[j])) { : missing value where TRUE/FALSE needed"

Would appreciate any help!!

Upvotes: 2

Views: 300

Answers (2)

Carl
Carl

Reputation: 7540

Although an older question, here's a newer option using the IV package dedicated to working with intervals:

library(tidyverse)
library(ivs)

# Provided input data
df <- structure(list(
  Email = c(
    "[email protected]", "[email protected]", "[email protected]", "[email protected]",
    "[email protected]", "[email protected]"
  ), Join.Time = structure(c(
    as.POSIXct("2020-12-09 13:04:00"),
    as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"), as.POSIXct("2020-12-09 13:07:00"),
    as.POSIXct("2020-12-09 13:46:00"), as.POSIXct("2020-12-09 13:29:00")
  ),
  class = c("POSIXct", "POSIXt"),
  tzone = ""
  ), Leave.Time = structure(c(
    as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
    as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
    as.POSIXct("2020-12-09 14:33:00")
  ), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
  Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)
), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"), row.names = c(NA, -6L), class = "data.frame")

# Check for overlaps
df |> 
  mutate(iv = iv(Join.Time, Leave.Time)) |> 
  group_by(Email) |> 
  mutate(iv = iv_groups(iv)) |> 
  ungroup() |> 
  add_count(iv) |> 
  mutate(overlap = if_else(n > 1, TRUE, FALSE))
#> # A tibble: 6 × 7
#>   Email            Join.Time           Leave.Time          Overlap
#>   <chr>            <dttm>              <dttm>              <lgl>  
#> 1 [email protected] 2020-12-09 13:04:00 2020-12-09 08:25:00 FALSE  
#> 2 [email protected] 2020-12-09 13:20:00 2020-12-09 08:22:00 FALSE  
#> 3 [email protected] 2020-12-09 13:30:00 2020-12-09 09:01:00 FALSE  
#> 4 [email protected] 2020-12-09 13:07:00 2020-12-09 08:29:00 TRUE   
#> 5 [email protected] 2020-12-09 13:46:00 2020-12-09 09:00:00 TRUE   
#> 6 [email protected] 2020-12-09 13:29:00 2020-12-09 09:33:00 TRUE   
#> # … with 3 more variables: iv <iv<dttm>>, n <int>, overlap <lgl>

Created on 2022-05-27 by the reprex package (v2.0.1)

Upvotes: 0

AcidCatfish
AcidCatfish

Reputation: 324

Another option from the thread you mentioned separately counts the overlapping values and adds them in as a separate column in a separate dataframe. Try this. It worked for me. I get the same output you provided.

library(data.frame)
dt <- data.table(df, key=c("Join.Time", "Leave.Time"))[, `:=`(Overlap=NULL, row=1:nrow(df))]
overlapping <- unique(foverlaps(dt, dt)[Email==i.Email & row!=i.row, Email])
dt[, `:=`(Overlap=FALSE, row=NULL)][Email %in% overlapping, Overlap:=TRUE][order(Email, Join.Time)]

Upvotes: 1

Related Questions