Reputation: 109
I have a dataset with few missing observations. My objective is to impute the missing value in each variable with the average of previous 13 values. In case there is a missing value before the 13th observation, the average of whatever there before should be used for imputing that variable. I am not sure how to do it.
Please use the below to replicate my dataset. Your help is much appreciated.
df1 <- structure(list(V1 = c(276.12, 53.4, 20.64, 181.8, 216.96, 10.44,
69, 144.24, 10.32, 239.76, 79.32, 257.64, 28.56, 117, 244.92,
234.48, NA, 337.68, 83.04, 176.76, 262.08, 284.88, 15.84, NA,
74.76, 315.48, 171.48, 288.12, 298.56, 84.72, 351.48, 135.48,
NA, 318.72, 114.84, 348.84, 320.28, 89.64, 51.72, 273.6, 243,
212.4, 352.32, 248.28, NA, 210.12, 107.64, 287.88, 272.64, 80.28,
239.76, 120.48, 259.68, 219.12, 315.24, 238.68, 8.76, 163.44,
252.96), V2 = c(45.36, 47.16, 55.08, 49.56, 12.96, 58.68, 39.36,
NA, 2.52, 3.12, 6.96, 28.8, NA, 9.12, 39.48, 57.24, 43.92, 47.52,
24.6, 28.68, 33.24, 6.12, 19.08, 20.28, 15.12, 4.2, 35.16, NA,
32.52, 19.2, 33.96, 20.88, 1.8, 24, 1.68, NA, 52.56, 59.28, 32.04,
45.24, 26.76, 40.08, 33.24, 10.08, 30.84, 27, 11.88, 49.8, 18.96,
14.04, 3.72, 11.52, 50.04, 55.44, 34.56, NA, 33.72, 23.04, 59.52
)), class = "data.frame", row.names = c(NA, -59L))
Upvotes: 0
Views: 92
Reputation: 60220
You can use zoo::rollapply
to compute the mean over the 13 values:
mean13 = zoo::rollapply(
df1$V1,
13,
function(x) {
mean(na.omit(x))
},
align = "right",
fill = NA,
partial = TRUE
)
df1$V1_prev_mean = c(df1$V1[1], head(mean13, -1))
df1$V1 = ifelse(is.na(df1$V1), df1$V1_prev_mean, df1$V1)
Output:
V1 V2 V1_prev_mean
1 276.1200 45.36 276.1200
2 53.4000 47.16 276.1200
3 20.6400 55.08 164.7600
4 181.8000 49.56 116.7200
5 216.9600 12.96 132.9900
6 10.4400 58.68 149.7840
7 69.0000 39.36 126.5600
8 144.2400 NA 118.3371
9 10.3200 2.52 121.5750
10 239.7600 3.12 109.2133
11 79.3200 6.96 122.2680
12 257.6400 28.80 118.3636
13 28.5600 NA 129.9700
14 117.0000 9.12 122.1692
15 244.9200 39.48 109.9292
16 234.4800 57.24 124.6615
17 141.1108 43.92 141.1108 # <- this row filled
18 337.6800 47.52 137.7200
19 83.0400 24.60 147.7800
20 176.7600 28.68 153.8300
Upvotes: 1