Reputation: 4243
I have a dataset like below:
Date Metric1 Metric2 Metric3 Metric4
2017-01-01 NA 3 NA 7
2017-01-02 NA 4 NA 10
2017-01-03 NA 2 NA 18
2017-01-04 5 8 NA 20
2017-01-05 8 9 87 34
2017-01-06 10 2 45 12
. . . . .
. . . . .
. . . . .
2018-09-01 12 13 14 15
2018-09-02 34 12 28 19
2018-09-03 45 12 45 34
2018-09-04 NA 14 49 11
2018-09-05 NA 11 90 12
2018-09-06 NA 15 NA 32
2018-09-07 NA 23 NA 43
2018-09-08 NA 12 NA 22
My dataset has 100 columns. There are no missing values in between the NA
s in their respective columns. Does anyone know a package or a function that will forecast or use a moving average for the values before and after the first or last numeric value?
I have done some research on this so far and the best I can find is na.fill
but that will just repeat values at the beginning and end of columns.
Upvotes: 1
Views: 85
Reputation: 2218
You can use the imputeTS
package to impute the missing values. For moving average you can do something like:
library(imputeTS)
ts_df[,2:5] <- apply(ts_df[,2:5], 2, na_ma, k = 6) # k = width of moving average
ts_df
Date Metric1 Metric2 Metric3 Metric4
1 2017-01-01 6.933333 3 64.57143 7
2 2017-01-02 7.806452 4 62.13333 10
3 2017-01-03 8.396825 2 61.58065 18
4 2017-01-04 5.000000 8 61.38095 20
5 2017-01-05 8.000000 9 87.00000 34
6 2017-01-06 10.000000 2 45.00000 12
7 2018-09-01 12.000000 13 14.00000 15
8 2018-09-02 34.000000 12 28.00000 19
9 2018-09-03 45.000000 12 45.00000 34
10 2018-09-04 33.984127 14 49.00000 11
11 2018-09-05 34.451613 11 90.00000 12
12 2018-09-06 35.333333 15 66.80952 32
13 2018-09-07 37.142857 23 67.16129 43
14 2018-09-08 41.333333 12 68.93333 22
Refer R documentation for more time series related imputation techniques in imputeTS package.
Data:
ts_df <- read.table(text = " Date Metric1 Metric2 Metric3 Metric4
2017-01-01 NA 3 NA 7
2017-01-02 NA 4 NA 10
2017-01-03 NA 2 NA 18
2017-01-04 5 8 NA 20
2017-01-05 8 9 87 34
2017-01-06 10 2 45 12
2018-09-01 12 13 14 15
2018-09-02 34 12 28 19
2018-09-03 45 12 45 34
2018-09-04 NA 14 49 11
2018-09-05 NA 11 90 12
2018-09-06 NA 15 NA 32
2018-09-07 NA 23 NA 43
2018-09-08 NA 12 NA 22" , header = T, colClasses = c("Date" = "Date"))
Upvotes: 2