Reputation: 1
I would like to insert rows when there are missing data within a 5 minute interval glucose sensor dataset. I have managed to complete this using the tsibble package but there can be time drifts in the data e.g. the sensor records a value at 4 minutes instead of 5. This causes the inserted time stamps to become unsynchronised throughout the remainder of the data frame.
Is there a way to complete this for a time interval that should be 5 minutes, but could be between 4 and 6 minutes? The dataset also includes multiple different IDs.
The ultimate aim is then to fill in the missing data gaps based upon a set criteria (i.e. max fill <= 3 rows) using the existing data.
Reprex pasted below.
library(tsibble, warn.conflicts = FALSE)
#> Warning: package 'tsibble' was built under R version 4.1.1
Data <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
gl = c(125L, 133L, 132L, 130L, 133L, 135L, 166L, 161L, 67L, 66L, 67L, 69L, 67L),
time = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
min = c(42L, 47L, 51L, 56L, 6L, 11L, 11L, 16L, 2L, 17L, 22L, 27L, 32L),
hour = c(9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 0L, 0L, 0L, 0L, 0L),
mday = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
mon = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
year = c(121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L,121L),
wday = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 0L, 0L, 0L, 0L,0L),
yday = c(92L, 92L, 92L, 92L, 92L, 92L, 92L, 92L, 93L, 93L,93L, 93L, 93L),
isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L)),
class = c("POSIXlt", "POSIXt"), tzone = "GMT"),
dif = structure(c(NA, 5, 4, 5, 10, 5, 60, 5, NA, 15, 5, 5, 5),
units = "mins", class = "difftime")),
class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -13L), groups = structure(list(id = 1:2, .rows = structure(list(1:8, 9:13),
ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L), .drop = TRUE))
x <- new_interval(minute = 5)
tsdata <- build_tsibble(Data, key = id, index = time, interval = x)
tsdata <- fill_gaps(tsdata, .full = FALSE)
Upvotes: 0
Views: 118
Reputation: 27732
This is probably not a final answer to what you are looking for, but it might get you started in getting what you want..
library(data.table)
library(zoo)
# Split to list by id
L <- split(DT, by = "id")
# Interpolate gl based on time
ans <- lapply(L, function(x) {
# build time series by minute
temp <- data.table::data.table(
id = unique(x$id),
time = seq(min(x$time), max(x$time), by = 60))
# join in measured data
temp[x, gl_measured := i.gl, on = .(time)]
# imterpolate gl-values
temp[, gl_approx := zoo::na.approx(gl_measured)]
})
# Bind list together again
final <- data.table::rbindlist(ans)
Upvotes: 1