TobKel
TobKel

Reputation: 1453

Extract the overlaps of time intervals for the calculation of availabilities

I have a dataset on malfunction times. The start of the malfunction is the Begin column and the end is the End column.
In addition there are 3 malfunction categories, which can also overlap in time.
My actual goal is to calculate the sum of the time period per day when there is no malfunction. (The category actually does not matter, I do not need a breakdown of the category).
For better illustration, here is an example data set of the malfunction times over 3 days:

> df_time
# A tibble: 8 × 3
  Category Begin               End                
  <chr>    <dttm>              <dttm>             
1 A        2023-07-15 01:40:11 2023-07-15 13:43:15
2 A        2023-07-16 05:54:44 2023-07-16 10:50:45
3 B        2023-08-16 07:43:09 2023-08-16 16:42:12
4 C        2023-08-16 12:00:00 2023-08-16 13:11:13
5 A        2023-08-16 18:00:00 2023-08-16 19:30:00
6 A        2023-08-17 08:00:00 2023-08-17 13:00:00
7 C        2023-08-17 11:12:45 2023-08-17 19:58:22
8 A        2023-08-17 19:01:45 2023-08-17 23:59:59

I have now tried to visualize the malfunction times graphically: enter image description here

Now I want to calculate a summary per calendar day, which shows how many minutes (or hours or seconds) no single malfunction occurred.
This is the sum of all green time intervals per day: enter image description here

If there would be only one malfunction category, one could simply calculate the 24 hours minus the malfunction interval.
However, I do not get further in dealing with the different overlapping malfunction intervals.
Can anyone help me in calculating the malfunction free periods and/or also the daily malfunction periods.

I have imagined such an evaluation to be somewhat: (Where the results are probably not exactly the same as the example data set above)

  Date       OK_times malfunction_times
  <date>     <chr>    <chr>
1 2023-07-15 17 hours 7 hours
2 2023-07-16 5 Hours  19 hours
3 2023-07-17 3 hours  21 hours

Here is the code for creating the example data frame:

library(tidyverse)

df_time <- tibble(
  Category = c("A", "A", "B", "C", "A", "A", "C", "A"),
  Begin = as_datetime(c(
    "2023-07-15 01:40:11",
    "2023-07-16 05:54:44",
    "2023-08-16 07:43:09",
    "2023-08-16 12:00:00",
    "2023-08-16 18:00:00",
    "2023-08-17 08:00:00",
    "2023-08-17 11:12:45",
    "2023-08-17 19:01:45"
    )),
  End = as_datetime(c(
    "2023-07-15 13:43:15",
    "2023-07-16 10:50:45",
    "2023-08-16 16:42:12",
    "2023-08-16 13:11:13",
    "2023-08-16 19:30:00",
    "2023-08-17 13:00:00",
    "2023-08-17 19:58:22",
    "2023-08-17 23:59:59"
    ))
)

Upvotes: 4

Views: 111

Answers (2)

Johannes Titz
Johannes Titz

Reputation: 1001

May I suggest an alternative that is more intuitive, but less efficient. The main idea is to create all intervals as vectors of seconds, then get rid of the duplicates (overlaps) and finally check for each day how many seconds of malfunctioning fall into this day:

# get days as numerics
begin_day <- as.numeric(as_datetime(date(df_time$Begin)))
days <- unique(begin_day)

# get times as numerics
begin_time <- as.numeric(df_time$Begin)
end_time <- as.numeric(df_time$End)

# create intervals as vectors of seconds and make them unique
# in other words: all overlaps are merged
intervals <- Map(function(x, y) x:y, begin_time, end_time)
intervals_unique <- unique(unlist(intervals))

# now we simply check how many seconds of a day overlap with the intervals
res <- sapply(days, function(x) sum(intervals_unique >= x & intervals_unique < x + 24 * 60 * 60))

data.frame(date = as_datetime(days), malfunction = res / (60 * 60))

Producing:

        date malfunction
1 2023-07-15   12.051389
2 2023-07-16    4.933889
3 2023-08-16   10.484722
4 2023-08-17   16.000000

It could be made more efficient by filtering the data frame for a specific day and then applying the procedure. This would make the intervals_unique vector smaller. Here, I just wanted to demonstrate the general idea. The results seam to align with Mark's solution.

Upvotes: 1

Mark
Mark

Reputation: 12548

interval_intersects <- function(i1, i2) { 
  # check if there's an overlap between the two intervals
  between(i1$Begin, i2$Begin, i2$End) | between(i1$End, i2$Begin, i2$End) | between(i2$Begin, i1$Begin, i1$End) | between(i2$End, i1$Begin, i1$End)
}

tidy_intervals <- function(df) {
  out <- df[0,] # empty tibble with the columns of the input df
  
  while (nrow(df) > 0) {
    matched = FALSE
    if (nrow(out) != 0) { # this is so janky but R will try to loop through the dataframe, even when it has no rows, even using seq_along
      
      # for each row in out, check if it intersects with the current row
      for (j in 1:nrow(out)) {

        # if it does, update the current output row to be the minimum of the two begin times and the maximum of the two end times
        if (interval_intersects(df[1, ], out[j, ])) {
          matched = TRUE
          out[j, ] <- tibble(
            Begin = min(df[1,]$Begin, out[j,]$Begin),
            End = max(df[1,]$End, out[j,]$End)
          )
          break
        }
      }
    }
    # if the current row didn't intersect with any of the output rows, append it to the output
    if (!matched) {
      out <- out |> add_row(
        Begin = df[1,]$Begin,
        End = df[1,]$End)
    }

    # remove the current row from the input
    df <- df[-1, ]
  }
  return(out)
}

tidy_intervals(df_time |> select(-Category)) |> 
  ##### everything from here until later is from the old answer https://stackoverflow.com/a/76905774/4145280 #####
  mutate(b = as.Date(Begin), e = as.Date(End),
        # create a sequence of dates between begin and end
        days = map2(b, e, ~ seq.Date(.x, .y, by = "1 day"))) |>
  # unnest the days column into many rows
  unnest(days) |>
  # if the beginning date is the same as the date in `days`, then use the original Begin column
  # else, use `days` as a datetime
  mutate(Begin = if_else(b == days, Begin, as_datetime(days)),
         # same with End, but subtracting one minute
         End = if_else(e == days, End, as_datetime(days) + days(1) - seconds(1)), .keep = "unused") |> 
 #### new stuff starts here ####
  mutate(Date = as.Date(Begin), 
         malfunction_times = End - Begin) |>
  reframe(malfunction_times = round(sum(malfunction_times)),
          OK_times = 24 - malfunction_times, .by = Date)

Output:

# A tibble: 4 × 3
  Date       malfunction_times OK_times
  <date>     <drtn>            <drtn>  
1 2023-07-15 12 hours          12 hours
2 2023-07-16  5 hours          19 hours
3 2023-08-16 10 hours          14 hours
4 2023-08-17 16 hours           8 hours

Upvotes: 1

Related Questions