signalstone
signalstone

Reputation: 47

Interpolation of time series of missing values in a column in r

I have currently looked at imputeTS and zoo packages but it does not see to work Current data is..

group/timeseries(character)
  1   2017-05-17 04:00:00
  1   2017-05-17 04:01:00
  1           NA
  1           NA
  1   2017-05-17 05:00:00
  1   2017-05-17 06:00:00
  2           NA
  2   2017-05-17 04:31:00
  2           NA
  2           NA
  2           NA
  2   2017-05-17 05:31:00

I would like to fill in NA with the interpolation time series so that the time is the mid point of the row before and after. Also, i have to point out that each time series belongs to a group. Meaning the time resets for each group.

I will provide a picture of the actual data to be more clear enter image description here

Thanks for the help in advance!

Upvotes: 1

Views: 1412

Answers (2)

Steffen Moritz
Steffen Moritz

Reputation: 7730

imputeTS and zoo do not take chars or timestamps as input for their interpolation functions. (usually interpolating chars does not make sense)

But you can give characters as input to the na.locf function of zoo. (the last observation is carried forward with this function)

The best solution for your task should be the following (I am assuming you have the date given as POSIX.ct)

# Perform the imputation on numeric input
temp <- imputeTS::na_interpolation( as.numeric ( input ) )

# Transform the numeric values back to dates
as.POSIXct(temp, origin = "1960-01-01", tz = "UTC")

With "input" in the first line being your vector with the POSIX.ct timestamps. The origin and tz (timezone) settings in line two have to be set according to your timestamps.

Upvotes: 1

Chris Holbrook
Chris Holbrook

Reputation: 2636

na.approx in the zoo package can do this and the grouping can be handled without loops using either tapply in base or as a group operation in data.table.

For your data set

df <- read.table(text=c("
  group   timeseries
  1   '2017-05-17 04:00:00'
  1   '2017-05-17 04:01:00'
  1   NA
  1   NA
  1   '2017-05-17 05:00:00'
  1   '2017-05-17 06:00:00'
  2   NA
  2   '2017-05-17 04:31:00'
  2   NA
  2   NA
  2   NA
  2   '2017-05-17 05:31:00'
"), 
colClasses = c("integer", "POSIXct"),
header = TRUE)

Write function to coerce vector to zoo object, interpolate NAs, extract result

library(zoo)
foo <- function(x) coredata(na.approx(zoo(x), na.rm = FALSE))

Example using tapply in base R to apply foo to each group

df2 <- df #make a copy
df2$timeseries <- do.call(c, tapply(df2$timeseries, INDEX = df2$group, foo))

Example using group by in data.table to apply foo to each group

library(data.table)
DT <- data.table(df)
DT[, timeseries := foo(timeseries), by = "group"]

Upvotes: 1

Related Questions