Piecewise interpolation for entire data.frame R

Question

I have a dataset from a sources that uses a special compression algorithm. Simply put, new measurements are recorded only when the change in slope (rate of change) is greater than a certain percentage (say 5%).

However, for the analysis I'm currently carrying out, I need values at regular intervals. I am able to carry out a piecewise interpolation using approx, approxfun or spline for different variables vs time (tme in below data) but I'd like to do it for all variables (columns of data.table) in a single shot.

library(data.table)
q = setDT(
structure(list(tme = structure(c(1463172120, 1463173320, 1463175720, 
1463180520, 1463182920, 1463187720, 1463188920, 1463190120, 1463191320, 
1463192520, 1463202180, 1463203380, 1463204580, 1463205780, 1463206980, 
1463208180, 1463218980, 1463233440, 1463244240, 1463245440, 1463246640, 
1463247840, 1463249040, 1463250240, 1463251440, 1463252640, 1463253840, 
1463255040, 1463256240, 1463316360, 1463317560, 1463318760, 1463319960, 
1463321160, 1463322360, 1463323560, 1463324760, 1463325960, 1463327160, 
1463328360, 1463329560, 1463330760, 1463331960), class = c("POSIXct", 
"POSIXt"), tzone = "America/Montreal"), rh = c(50.36, 47.31, 
46.39, 46.99, 47.89, 50.37, 51.29, 51.92, 54.97, 67.64, 69.38, 
68.96, 69.89, 56.66, 51.23, 55.38, 64.36, 50.72, 31.33, 31.38, 
32.65, 33.15, 33.05, 31.87, 32.58, 32.65, 31.06, 29.82, 28.72, 
67.95, 66.68, 64.66, 62.12, 59.86, 58.11, 57.41, 56.5, 56.16, 
55.69, 54.57, 53.89, 53.81, 52.01), degc = c(30.0055555555556, 
30.3611111111111, 30.6611111111111, 30.5833333333333, 30.2666666666667, 
28.6888888888889, 28.2555555555556, 28.0722222222222, 27.4944444444444, 
25.0722222222222, 24.8111111111111, 24.7166666666667, 24.1666666666667, 
25.4111111111111, 25.5222222222222, 24.3555555555556, 22.7722222222222, 
25.5222222222222, 27.8111111111111, 27.9888888888889, 28.0277777777778, 
28.1333333333333, 28.5333333333333, 28.7, 28.85, 29.1555555555556, 
28.8388888888889, 29.5111111111111, 29.6722222222222, 22.3888888888889, 
22.5722222222222, 22.9444444444444, 23.3722222222222, 23.6777777777778, 
23.8777777777778, 24.2055555555556, 24.6888888888889, 24.9777777777778, 
25.3888888888889, 25.8, 26.1, 26.1555555555556, 26.7388888888889
)), .Names = c("tme", "rh", "degc"), row.names = c(NA, -43L), class = c("data.table", 
"data.frame")))

q is my queried dataset. Here's what works for individual variables (degc in this example):

interpolate_degc <- approxfun(x = q$tme, y = q$degc, method = "linear")

# To get the uniform samples: 
width <- "10 mins"
new_times <- seq.POSIXt(from = q$tme[1], to = q$tme[nrow(q)], by = width)
new_degc <- interpolate_degc(new_times)

I'd like to do this for all variables in a single shot, preferably using data.table.

Frank · Accepted Answer

This seems to work:

cols = c("rh", "degc")
DT = q[.(seq(min(tme), max(tme), by="10 mins")), on=.(tme)]
DT[, (cols) := lapply(cols, function(z) with(q, 
  approxfun(x = tme, y = get(z), method = "linear")
)(tme))]

                     tme     rh     degc
  1: 2016-05-13 16:42:00 50.360 30.00556
  2: 2016-05-13 16:52:00 48.835 30.18333
  3: 2016-05-13 17:02:00 47.310 30.36111
  4: 2016-05-13 17:12:00 47.080 30.43611
  5: 2016-05-13 17:22:00 46.850 30.51111
 ---                                    
263: 2016-05-15 12:22:00 54.026 26.04000
264: 2016-05-15 12:32:00 53.866 26.11667
265: 2016-05-15 12:42:00 53.826 26.14444
266: 2016-05-15 12:52:00 53.270 26.33056
267: 2016-05-15 13:02:00 52.370 26.62222

Generally when you want to iterate over columns, lapply or Map will work.

How it works: Inside the with(q, ...), tme and get(z) refer to columns of q, but outside of it, we're looking at columns of DT (in this case just tme).

Another way of doing the same thing:

q[, {
  tt = seq(min(tme), max(tme), by="10 mins")

  c(
    .(tme = tt), 
    lapply(.SD, function(z) approxfun(x = tme, y = z, method="linear")(tt))
  )
}, .SDcols=cols]

Piecewise interpolation for entire data.frame R

Answers (2)

Related Questions