Reputation: 107
I have a problem at hand.
Objective: I have a monthly time series data set which comprises of "zero's" as well as "NAs". Here Zero's are values and I want them to be changed whereas NA's are the missing values that I am looking to impute using StructTS in R.
Data set example
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
dataset[1,2]<-0
dataset[4,4] <- 0
Here in dataset, I just want to replace the NA with a value and let the zero's be zeros only.
After researching and reading several blogs, I used the following methods:
missvalue <- function(df){
x<-df
x <- ts(rev(x),f=12)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
return(list(N=fit))
}
Newdata<-lapply(m,missvalue)
I also tried a mean technique:
##Missing Value another treatment
nzmean <- function(x) {
if (all(x==0)) 0 else mean(x[x!=0])
}
apply(m,1,nzmean)
Attached are the posts I referred:
Any help on this would be really great.
Upvotes: 1
Views: 3741
Reputation: 7730
I can recommend the imputeTS package here (I am the maintainer). Makes life really easy for this task. (https://cran.r-project.org/web/packages/imputeTS/index.html)
Offers several algorithms like imputation with mean, median, linear interpolation, spline interpolation, kalman smoothing, ...
Here one example:
library(imputeTS)
dataset[ ,1] <- na.kalman(dataset[ ,1])
Another one:
dataset[ ,1] <- na.interpolation(dataset[ ,1])
Another one:
dataset[ ,1] <- na.mean(dataset[ ,1])
Another one:
dataset[ ,1] <- na.locf(dataset[ ,1])
The only downside is, the package does not allow a data.frame as input, so one would have to loop through the columns seperatly. (but on the positive side you could also use different algorithms for different columns)
Upvotes: 3
Reputation: 28461
na.approx
is a useful function from the package 'zoo'. It will use several methods to approximate missing values in the data set. Search ?na.approx
for more information on parameter options and applications. It will focus on NA
entries and will leave zeroes untouched. Hope that helps.
library(zoo)
na.approx(dataset)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0 0 4 1 1
[2,] 5.0 1 3 5 1
[3,] 3.0 2 4 2 1
[4,] 3.5 2 2 0 1
[5,] 4.0 5 2 4 1
Data
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 4 1 1
[2,] 5 1 3 5 NA
[3,] 3 2 4 2 NA
[4,] NA 2 2 0 1
[5,] 4 5 2 4 1
Upvotes: 3