Reputation: 33
i have a dataframe df with a column containing values (meter reading). Some values are sporadically missing (NA).
df excerpt:
row time meter_reading
1 03:10:00 26400
2 03:15:00 NA
3 03:20:00 27200
4 03:25:00 28000
5 03:30:00 NA
6 03:35:00 NA
7 03:40:00 30000
What I'm trying to do:
If there is only one consecutive NA, I want to interpolate (e.g. na.interpolation for row 2). But if there's two or more consecutive NA, I don't want R to interpolate and leave the values as NA. (e.g. row 5 and 6).
What I tried so far is loop (for...) with an if-condition. My approach:
library("imputeTS")
for(i in 1:(nrow(df))) {
if(!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i-1]) & !is.na(df$meter_reading[i-2])) {
na_interpolation(df$meter_reading)
}
}
Giving me :
Error in if (!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i - :
argument is of length zero
Any ideas how to do it? Am I completely wrong here?
Thanks!
Upvotes: 0
Views: 126
Reputation: 7730
Just an addition here, in the current imputeTS package version, there is also a maxgap
option for each imputation algorithm, which easily solves this problem. Probably wasn't there yet, as you asked this question.
Your code would look like this:
library("imputeTS")
na_interpolation(df, maxgap = 1)
This means gaps of 1 NA get imputed, while longer gaps of consecutive NAs remain NA.
Upvotes: 0
Reputation: 2210
I don't knaow what is your na.interpolation, but taking the mean of previous and next rows for example, you could do that with dplyr :
df %>% mutate(x=ifelse(is.na(meter_reading),
(lag(meter_reading)+lead(meter_reading))/2,
meter_reading))
# row time meter_reading x
#1 1 03:10:00 26400 26400
#2 2 03:15:00 NA 26800
#3 3 03:20:00 27200 27200
#4 4 03:25:00 28000 28000
#5 5 03:30:00 NA NA
#6 6 03:35:00 NA NA
#7 7 03:40:00 30000 30000
Upvotes: 1
Reputation: 10385
A quick look shows that your counter i
starts at 1 and then you try to get index at i-1
andi-2
.
Upvotes: 0