Knorth
Knorth

Reputation: 1

How to insert new rows for missing data with intervals that could vary by a few minutes in R

I would like to insert rows when there are missing data within a 5 minute interval glucose sensor dataset. I have managed to complete this using the tsibble package but there can be time drifts in the data e.g. the sensor records a value at 4 minutes instead of 5. This causes the inserted time stamps to become unsynchronised throughout the remainder of the data frame.

Is there a way to complete this for a time interval that should be 5 minutes, but could be between 4 and 6 minutes? The dataset also includes multiple different IDs.

The ultimate aim is then to fill in the missing data gaps based upon a set criteria (i.e. max fill <= 3 rows) using the existing data.

Reprex pasted below.

library(tsibble, warn.conflicts = FALSE)
#> Warning: package 'tsibble' was built under R version 4.1.1

Data <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
gl = c(125L, 133L, 132L, 130L, 133L, 135L, 166L, 161L, 67L, 66L, 67L, 69L, 67L), 
time = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
min = c(42L, 47L, 51L, 56L, 6L, 11L, 11L, 16L, 2L, 17L, 22L, 27L, 32L), 
hour = c(9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 0L, 0L, 0L, 0L, 0L), 
mday = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), 
mon = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
year = c(121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L,121L), 
wday = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 0L, 0L, 0L, 0L,0L), 
yday = c(92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 93L, 93L,93L, 93L, 93L), 
isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L)), 
class = c("POSIXlt", "POSIXt"), tzone = "GMT"), 
dif = structure(c(NA, 5, 4, 5, 10, 5, 60, 5, NA, 15, 5, 5, 5), 
units = "mins", class = "difftime")), 
class = c("grouped_df", "tbl_df", "tbl", "data.frame"), 
row.names = c(NA, -13L), groups = structure(list(id = 1:2, .rows = structure(list(1:8, 9:13), 
ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), .drop = TRUE))

x <- new_interval(minute = 5) 
tsdata <- build_tsibble(Data, key = id, index = time, interval = x)
tsdata <- fill_gaps(tsdata, .full = FALSE) 

Upvotes: 0

Views: 118

Answers (1)

Wimpel
Wimpel

Reputation: 27732

This is probably not a final answer to what you are looking for, but it might get you started in getting what you want..

library(data.table)
library(zoo)
# Split to list by id
L <- split(DT, by = "id")
# Interpolate gl based on time
ans <- lapply(L, function(x) {
  # build time series by minute
  temp <- data.table::data.table(
    id = unique(x$id), 
    time = seq(min(x$time), max(x$time), by = 60))
  # join in measured data
  temp[x, gl_measured := i.gl, on = .(time)]
  # imterpolate gl-values
  temp[, gl_approx := zoo::na.approx(gl_measured)]
  })
# Bind list together again
final <- data.table::rbindlist(ans)

Upvotes: 1

Related Questions