Replace missing values in a time series dataset with both NA and Zero

I have a problem at hand.

Objective: I have a monthly time series data set which comprises of "zero's" as well as "NAs". Here Zero's are values and I want them to be changed whereas NA's are the missing values that I am looking to impute using StructTS in R.

Data set example

dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
dataset[1,2]<-0
dataset[4,4] <- 0

Here in dataset, I just want to replace the NA with a value and let the zero's be zeros only.

After researching and reading several blogs, I used the following methods:

    missvalue <- function(df){
    x<-df
    x <- ts(rev(x),f=12)

    fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
    tsp(fit) <- tsp(x)  
    return(list(N=fit))
    }

    Newdata<-lapply(m,missvalue)

I also tried a mean technique:

   ##Missing Value another treatment 

    nzmean <- function(x) {
    if (all(x==0)) 0 else mean(x[x!=0])
    }
    apply(m,1,nzmean)

Attached are the posts I referred:

Any help on this would be really great.

Upvotes: 1

Answers (2)

Steffen Moritz

Reputation: 7730

I can recommend the imputeTS package here (I am the maintainer). Makes life really easy for this task. (https://cran.r-project.org/web/packages/imputeTS/index.html)

Offers several algorithms like imputation with mean, median, linear interpolation, spline interpolation, kalman smoothing, ...

Here one example:

library(imputeTS)
dataset[ ,1] <- na.kalman(dataset[ ,1])

Another one:

   dataset[ ,1] <- na.interpolation(dataset[ ,1])

Another one:

   dataset[ ,1] <- na.mean(dataset[ ,1])

Another one:

   dataset[ ,1] <- na.locf(dataset[ ,1])

The only downside is, the package does not allow a data.frame as input, so one would have to loop through the columns seperatly. (but on the positive side you could also use different algorithms for different columns)

Upvotes: 3

Pierre L

Reputation: 28461

na.approx is a useful function from the package 'zoo'. It will use several methods to approximate missing values in the data set. Search ?na.approx for more information on parameter options and applications. It will focus on NA entries and will leave zeroes untouched. Hope that helps.

library(zoo)
na.approx(dataset)
     [,1] [,2] [,3] [,4] [,5]
[1,]  1.0    0    4    1    1
[2,]  5.0    1    3    5    1
[3,]  3.0    2    4    2    1
[4,]  3.5    2    2    0    1
[5,]  4.0    5    2    4    1

Data

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    4    1    1
[2,]    5    1    3    5   NA
[3,]    3    2    4    2   NA
[4,]   NA    2    2    0    1
[5,]    4    5    2    4    1

Upvotes: 3

Replace missing values in a time series dataset with both NA and Zero

Answers (2)

Related Questions