Peha
Peha

Reputation: 33

interpolation for limited number of NA

i have a dataframe df with a column containing values (meter reading). Some values are sporadically missing (NA).

df excerpt:

row   time      meter_reading
1     03:10:00  26400
2     03:15:00  NA
3     03:20:00  27200
4     03:25:00  28000
5     03:30:00  NA
6     03:35:00  NA
7     03:40:00  30000

What I'm trying to do:

If there is only one consecutive NA, I want to interpolate (e.g. na.interpolation for row 2). But if there's two or more consecutive NA, I don't want R to interpolate and leave the values as NA. (e.g. row 5 and 6).

What I tried so far is loop (for...) with an if-condition. My approach:

library("imputeTS")
for(i in 1:(nrow(df))) {
  if(!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i-1]) & !is.na(df$meter_reading[i-2])) {
    na_interpolation(df$meter_reading) 
    }
}

Giving me :

Error in if (!is.na(df$meter_reading[i]) & is.na(df$meter_reading[i -  : 
  argument is of length zero

Any ideas how to do it? Am I completely wrong here?

Thanks!

Upvotes: 0

Views: 126

Answers (3)

Steffen Moritz
Steffen Moritz

Reputation: 7730

Just an addition here, in the current imputeTS package version, there is also a maxgap option for each imputation algorithm, which easily solves this problem. Probably wasn't there yet, as you asked this question.

Your code would look like this:

library("imputeTS")
na_interpolation(df, maxgap = 1)

This means gaps of 1 NA get imputed, while longer gaps of consecutive NAs remain NA.

Upvotes: 0

Nicolas2
Nicolas2

Reputation: 2210

I don't knaow what is your na.interpolation, but taking the mean of previous and next rows for example, you could do that with dplyr :

df %>% mutate(x=ifelse(is.na(meter_reading),
                       (lag(meter_reading)+lead(meter_reading))/2,
                       meter_reading))
#  row     time meter_reading     x
#1   1 03:10:00         26400 26400
#2   2 03:15:00            NA 26800
#3   3 03:20:00         27200 27200
#4   4 03:25:00         28000 28000
#5   5 03:30:00            NA    NA
#6   6 03:35:00            NA    NA
#7   7 03:40:00         30000 30000

Upvotes: 1

user2974951
user2974951

Reputation: 10385

A quick look shows that your counter i starts at 1 and then you try to get index at i-1 andi-2.

Upvotes: 0

Related Questions