Federico Tallis
Federico Tallis

Reputation: 21

na.approx Interpolation in R

I'm using Zoo's na.approx to fill NA values.

library(zoo)
Bus_data<-data.frame(Action = c("Boarding", "Alighting",NA, NA,"Boarding", "Alighting",NA, NA,"Boarding", "Alighting"),
Distance=c(1,1,2,2,3,3,4,4,5,5),
Time = c(1,2,NA,NA,5,6,NA,NA,9,10))

I'd like the resulting data.frame to look like the following:

      Action Distance Time
1   Boarding        1    1
2  Alighting        1    2
3         NA        2   3.5
4         NA        2   3.5
5   Boarding        3    5
6  Alighting        3    6
7         NA        4   7.5
8         NA        4   7.5
9   Boarding        5    9
10 Alighting        5   10

However, when I use

na.approx(Bus_data$Time,Bus_data$Distance,ties = "ordered" )
1   Boarding        1    2 <-Value Changes
2  Alighting        1    2
3         NA        2   3.5
4         NA        2   3.5
5   Boarding        3    6 <-Value Changes
6  Alighting        3    6
7         NA        4   7.5
8         NA        4   7.5
9   Boarding        5   10 <-Value Changes
10 Alighting        5   10

Any idea how I could get the desired outcome through na.approx? Note, in the example "Distance" is evenly spaced for simplification, the dataset has varying distances.

Upvotes: 1

Views: 966

Answers (2)

markus
markus

Reputation: 26343

You can use approx from baseR

Time = c(1,2,NA,NA,5,6,NA,NA,9,10)
approx(Time, method = "constant", n = length(Time), f = .5)$y

Result

# [1]  1.0  2.0  3.5  3.5  5.0  6.0  7.5  7.5  9.0 10.0

From ?approx

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.


With na.approx it would be similar

library(zoo)
na.approx(Time, method = "constant", f = .5)

Upvotes: 3

akrun
akrun

Reputation: 887058

We could replace the non-NA elements of original column to NA after the na.approx and then do a coalesce

library(dplyr)
library(zoo)
coalesce(Bus_data$Time, replace(na.approx(Bus_data$Time,Bus_data$Distance,
            ties = "ordered" ), 
        !is.na(Bus_data$Time), NA))
#[1]  1.0  2.0  3.5  3.5  5.0  6.0  7.5  7.5  9.0 10.0

Upvotes: 2

Related Questions