Chris Ruehlemann
Chris Ruehlemann

Reputation: 21432

How to fill missing time intervals

I have a dataframe with measurements taken at different intervals:

df <- data.frame(
  A_aoi = c("C", "C", "C", "B"),
  starttime_ms = c(49, 1981, 6847, 7180),
  endtime_ms = c(1981, 6115, 7048, 10080)
)

Sometimes the intervals are completely contiguous, i.e., the starttime_ms for the next measurement is the endtime_ms of the prior measurement. More often, however, there are gaps between the intervals. I need to funnel-in rows into the df whenever there is such a gap; the row should state when that gap starts and when it ends. The closest I have come so far to a solution is by detecting and measuring the duration of the gap:

library(dplyr)
df$gap <- ifelse(lag(df$starttime_ms,1) == df$endtime_ms, 
                  NA, 
                  lead(df$starttime_ms,1) - df$endtime_ms)

However that's still far from the desired output:

   A_aoi starttime_ms endtime_ms 
1     C           49        1981
2     C         1981        6115
3    NA         6115        6847
4     C         6847        7048
5    NA         7048        7180
6     B         7180       10080

Upvotes: 0

Views: 161

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6509

You could use data.table package as follows:

library(data.table)

unq <- sort(unique(setDT(df)[, c(starttime_ms, endtime_ms)]))

df[.(unq[-length(unq)], unq[-1]), on=c("starttime_ms", "endtime_ms")]

# A_aoi starttime_ms endtime_ms     
#     C           49       1981    
#     C         1981       6115     
#  <NA>         6115       6847    
#     C         6847       7048   
#  <NA>         7048       7180    
#     B         7180      10080   

Upvotes: 1

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df <- data.frame(
  A_aoi = c("C", "C", "C", "B"),
  starttime_ms = c(49, 1981, 6847, 7180),
  endtime_ms = c(1981, 6115, 7048, 10080)
)
df
#>   A_aoi starttime_ms endtime_ms
#> 1     C           49       1981
#> 2     C         1981       6115
#> 3     C         6847       7048
#> 4     B         7180      10080


x <- sort(unique(unlist(df[-1])))

df_int <- data.frame(starttime_ms = x[-length(x)], endtime_ms = x[-1])

library(tidyverse)
left_join(df_int, df, by = c("starttime_ms", "endtime_ms")) %>% 
  relocate(A_aoi, everything())
#>   A_aoi starttime_ms endtime_ms
#> 1     C           49       1981
#> 2     C         1981       6115
#> 3  <NA>         6115       6847
#> 4     C         6847       7048
#> 5  <NA>         7048       7180
#> 6     B         7180      10080

Created on 2021-03-03 by the reprex package (v1.0.0)

Upvotes: 1

Related Questions