Nick Knauer
Nick Knauer

Reputation: 4243

Time Series Package that Replaces NA values as a Forecast

I have a dataset like below:

       Date     Metric1      Metric2       Metric3       Metric4      
 2017-01-01          NA            3            NA             7
 2017-01-02          NA            4            NA            10
 2017-01-03          NA            2            NA            18
 2017-01-04           5            8            NA            20
 2017-01-05           8            9            87            34
 2017-01-06          10            2            45            12
        .             .            .             .             .
        .             .            .             .             .
        .             .            .             .             .
 2018-09-01          12           13            14            15
 2018-09-02          34           12            28            19 
 2018-09-03          45           12            45            34
 2018-09-04          NA           14            49            11
 2018-09-05          NA           11            90            12 
 2018-09-06          NA           15            NA            32
 2018-09-07          NA           23            NA            43
 2018-09-08          NA           12            NA            22

My dataset has 100 columns. There are no missing values in between the NAs in their respective columns. Does anyone know a package or a function that will forecast or use a moving average for the values before and after the first or last numeric value?

I have done some research on this so far and the best I can find is na.fill but that will just repeat values at the beginning and end of columns.

Upvotes: 1

Views: 85

Answers (1)

Mankind_2000
Mankind_2000

Reputation: 2218

You can use the imputeTS package to impute the missing values. For moving average you can do something like:

library(imputeTS)

ts_df[,2:5] <- apply(ts_df[,2:5], 2, na_ma, k = 6)        # k = width of moving average

ts_df
         Date   Metric1 Metric2  Metric3 Metric4
1  2017-01-01  6.933333       3 64.57143       7
2  2017-01-02  7.806452       4 62.13333      10
3  2017-01-03  8.396825       2 61.58065      18
4  2017-01-04  5.000000       8 61.38095      20
5  2017-01-05  8.000000       9 87.00000      34
6  2017-01-06 10.000000       2 45.00000      12
7  2018-09-01 12.000000      13 14.00000      15
8  2018-09-02 34.000000      12 28.00000      19
9  2018-09-03 45.000000      12 45.00000      34
10 2018-09-04 33.984127      14 49.00000      11
11 2018-09-05 34.451613      11 90.00000      12
12 2018-09-06 35.333333      15 66.80952      32
13 2018-09-07 37.142857      23 67.16129      43
14 2018-09-08 41.333333      12 68.93333      22

Refer R documentation for more time series related imputation techniques in imputeTS package.

Data:

ts_df <- read.table(text = "      Date     Metric1      Metric2       Metric3       Metric4      
 2017-01-01          NA            3            NA             7
           2017-01-02          NA            4            NA            10
           2017-01-03          NA            2            NA            18
           2017-01-04           5            8            NA            20
           2017-01-05           8            9            87            34
           2017-01-06          10            2            45            12
           2018-09-01          12           13            14            15
           2018-09-02          34           12            28            19 
           2018-09-03          45           12            45            34
           2018-09-04          NA           14            49            11
           2018-09-05          NA           11            90            12 
           2018-09-06          NA           15            NA            32
           2018-09-07          NA           23            NA            43
           2018-09-08          NA           12            NA            22" , header = T, colClasses = c("Date" = "Date"))

Upvotes: 2

Related Questions