Extract the overlaps of time intervals for the calculation of availabilities

Question

I have a dataset on malfunction times. The start of the malfunction is the Begin column and the end is the End column.
In addition there are 3 malfunction categories, which can also overlap in time.
My actual goal is to calculate the sum of the time period per day when there is no malfunction. (The category actually does not matter, I do not need a breakdown of the category).
For better illustration, here is an example data set of the malfunction times over 3 days:

> df_time
# A tibble: 8 × 3
  Category Begin               End                
                                 
1 A        2023-07-15 01:40:11 2023-07-15 13:43:15
2 A        2023-07-16 05:54:44 2023-07-16 10:50:45
3 B        2023-08-16 07:43:09 2023-08-16 16:42:12
4 C        2023-08-16 12:00:00 2023-08-16 13:11:13
5 A        2023-08-16 18:00:00 2023-08-16 19:30:00
6 A        2023-08-17 08:00:00 2023-08-17 13:00:00
7 C        2023-08-17 11:12:45 2023-08-17 19:58:22
8 A        2023-08-17 19:01:45 2023-08-17 23:59:59

I have now tried to visualize the malfunction times graphically:

Now I want to calculate a summary per calendar day, which shows how many minutes (or hours or seconds) no single malfunction occurred.
This is the sum of all green time intervals per day:

If there would be only one malfunction category, one could simply calculate the 24 hours minus the malfunction interval.
However, I do not get further in dealing with the different overlapping malfunction intervals.
Can anyone help me in calculating the malfunction free periods and/or also the daily malfunction periods.

I have imagined such an evaluation to be somewhat: (Where the results are probably not exactly the same as the example data set above)

  Date       OK_times malfunction_times
           
1 2023-07-15 17 hours 7 hours
2 2023-07-16 5 Hours  19 hours
3 2023-07-17 3 hours  21 hours

Here is the code for creating the example data frame:

library(tidyverse)

df_time <- tibble(
  Category = c("A", "A", "B", "C", "A", "A", "C", "A"),
  Begin = as_datetime(c(
    "2023-07-15 01:40:11",
    "2023-07-16 05:54:44",
    "2023-08-16 07:43:09",
    "2023-08-16 12:00:00",
    "2023-08-16 18:00:00",
    "2023-08-17 08:00:00",
    "2023-08-17 11:12:45",
    "2023-08-17 19:01:45"
    )),
  End = as_datetime(c(
    "2023-07-15 13:43:15",
    "2023-07-16 10:50:45",
    "2023-08-16 16:42:12",
    "2023-08-16 13:11:13",
    "2023-08-16 19:30:00",
    "2023-08-17 13:00:00",
    "2023-08-17 19:58:22",
    "2023-08-17 23:59:59"
    ))
)

Mark · Accepted Answer

interval_intersects <- function(i1, i2) { 
  # check if there's an overlap between the two intervals
  between(i1$Begin, i2$Begin, i2$End) | between(i1$End, i2$Begin, i2$End) | between(i2$Begin, i1$Begin, i1$End) | between(i2$End, i1$Begin, i1$End)
}

tidy_intervals <- function(df) {
  out <- df[0,] # empty tibble with the columns of the input df
  
  while (nrow(df) > 0) {
    matched = FALSE
    if (nrow(out) != 0) { # this is so janky but R will try to loop through the dataframe, even when it has no rows, even using seq_along
      
      # for each row in out, check if it intersects with the current row
      for (j in 1:nrow(out)) {

        # if it does, update the current output row to be the minimum of the two begin times and the maximum of the two end times
        if (interval_intersects(df[1, ], out[j, ])) {
          matched = TRUE
          out[j, ] <- tibble(
            Begin = min(df[1,]$Begin, out[j,]$Begin),
            End = max(df[1,]$End, out[j,]$End)
          )
          break
        }
      }
    }
    # if the current row didn't intersect with any of the output rows, append it to the output
    if (!matched) {
      out <- out |> add_row(
        Begin = df[1,]$Begin,
        End = df[1,]$End)
    }

    # remove the current row from the input
    df <- df[-1, ]
  }
  return(out)
}

tidy_intervals(df_time |> select(-Category)) |> 
  ##### everything from here until later is from the old answer https://stackoverflow.com/a/76905774/4145280 #####
  mutate(b = as.Date(Begin), e = as.Date(End),
        # create a sequence of dates between begin and end
        days = map2(b, e, ~ seq.Date(.x, .y, by = "1 day"))) |>
  # unnest the days column into many rows
  unnest(days) |>
  # if the beginning date is the same as the date in `days`, then use the original Begin column
  # else, use `days` as a datetime
  mutate(Begin = if_else(b == days, Begin, as_datetime(days)),
         # same with End, but subtracting one minute
         End = if_else(e == days, End, as_datetime(days) + days(1) - seconds(1)), .keep = "unused") |> 
 #### new stuff starts here ####
  mutate(Date = as.Date(Begin), 
         malfunction_times = End - Begin) |>
  reframe(malfunction_times = round(sum(malfunction_times)),
          OK_times = 24 - malfunction_times, .by = Date)

Output:

# A tibble: 4 × 3
  Date       malfunction_times OK_times
                     
1 2023-07-15 12 hours          12 hours
2 2023-07-16  5 hours          19 hours
3 2023-08-16 10 hours          14 hours
4 2023-08-17 16 hours           8 hours

Extract the overlaps of time intervals for the calculation of availabilities

Answers (2)

Related Questions