Ethan Nguyen
Ethan Nguyen

Reputation: 56

Bracketing time intervals in R/tidyverse

I need to figure out how much time is overlap between an interval and a predetermined bracket of hours during the day.

Here's the example data

require(tidyverse)
example <- tibble::tribble(
  ~`"Session"`, ~`"SessionStartTime"`,   ~`"SessionEndTime"`, ~`"TotalDuration"`,
            1L, "2020-09-10 09:05:50", "2020-09-11 05:50:02",             1244.2,
            2L, "2020-09-10 23:55:20", "2020-09-11 01:20:20",                 85,
            3L, "2020-09-11 01:11:30", "2020-09-11 02:12:02",   60.5333333333333,
            4L, "2020-09-11 02:20:30", "2020-09-11 03:00:02",   39.5333333333333
  )

And here's the output I'm looking for:

output <- tibble::tribble(
  ~Session,   ~`00:00-03:00`, ~`03:00-06:00`, ~`06:00-09:00`, ~`09:00-12:00`, ~`12:00-15:00`, ~`15:00-18:00`, ~`18:00-21:00`,   ~`21:00-00:00`,
        1L,        "3 hours",   "2.82 hours",             NA,             NA,             NA,             NA,             NA, "2.902778 hours",
        2L, "1.338889 hours",             NA,             NA,             NA,             NA,             NA,             NA,  "4.666667 mins",
        3L,  "60.53333 mins",             NA,             NA,             NA,             NA,             NA,             NA,               NA,
        4L, "2.341667 hours",    "2 seconds",             NA,             NA,             NA,             NA,             NA,               NA
  )

I've tried lubridate::interval but run into problems with sessions overlapping with multiple brackets, and sessions spanning across days. I've also tried chron but doesn't handle the difference between 5:00 the next day and 23:00 the previous day.

Upvotes: 0

Views: 175

Answers (1)

Ben
Ben

Reputation: 30474

Here is something to try - though I'm sure there are better approaches than this.

You can create a function that will identify time overlaps with 3-hour intervals, using intersect and interval from lubridate. In this case, to force the chosen 3-hour intervals to start with midnight, you'll need a new sequence from start to end times.

Once the overlaps can be determined from start and end times, you can use mapply for each row in your example data. Since there may be multiple days involved, you can group_by the session and seq time interval and sum up. The final pivot_wider will put data into your desired wide format. Note the final durations of time are in seconds.

library(tidyverse)
library(lubridate)    

get_intervals <- function(session, start, end) {
  seq3h <- seq.POSIXt(floor_date(start, unit = "day"), 
                      ceiling_date(end, unit = "day"), 
                      "3 hours")
  seq3h_int <- interval(seq3h[-length(seq3h)], seq3h[-1])
  data.frame(
    session,
    seq = sapply(seq3h_int, 
                 function(x) paste(hour(int_start(x)), hour(int_end(x)), sep = "_")),
    overlap = sapply(seq3h_int, 
                     intersect, 
                     interval(start = start, end = end))
  )
}

do.call(rbind, 
        mapply(get_intervals, 
               example$Session, 
               example$SessionStartTime, 
               example$SessionEndTime, 
               SIMPLIFY = FALSE)
        ) %>%
  group_by(session, seq) %>%
    summarise(overlap = sum(overlap, na.rm = TRUE)) %>%
    pivot_wider(id_cols = session, 
                names_from = seq, 
                values_from = overlap, 
                names_prefix = "T")

Output

  session  T0_3 T12_15 T15_18 T18_21 T21_0  T3_6  T6_9 T9_12
    <int> <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1       1 10800  10800  10800  10800 10800 10202     0 10450
2       2  4820      0      0      0   280     0     0     0
3       3  3632      0      0      0     0     0     0     0
4       4  2370      0      0      0     0     2     0     0

Upvotes: 1

Related Questions