Miquel Vallbona
Miquel Vallbona

Reputation: 65

How to fill NAs based on values near them

I have a dataframe in which a column (x) has some missing values and I would like to create a new column (y) replacing these NAs with the nearest value already existing. Also I would like that if the distance is the same both ways, I get the mean of both.

Here is an example of what I mean:

data <- data.frame(x = c(2, 3, 3, NA, NA, 4, NA, 3, 2, NA, NA, NA, 4))

The dataframe I would like to obtain would be that:

x y
2 2
3 3
3 3
NA 3
NA 4
4 4
NA 3.5
3 3
2 2
NA 2
NA 3
NA 4
4 4

Thanks in advance

Upvotes: 1

Views: 66

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 51914

Your case is slightly different from the usual cases, where interpolation is needed from the first NA. Here, interpolation is necessary only when the NA sequence is 1, 3, or more, and for the first and last value in a sequence of NA, you need the closest non-NA.

You can use lead and lag to do so, and then apply interpolation with zoo::na.approx.

library(dplyr)
library(zoo)
data %>% 
  mutate(y = case_when(is.na(x) & !is.na(lag(x)) & is.na(lead(x)) ~ lag(x),
                       is.na(x) & is.na(lag(x)) & !is.na(lead(x)) ~ lead(x),
                       TRUE ~ x) %>% 
           na.approx(.))

output

    x   y
1   2 2.0
2   3 3.0
3   3 3.0
4  NA 3.0
5  NA 4.0
6   4 4.0
7  NA 3.5
8   3 3.0
9   2 2.0
10 NA 2.0
11 NA 3.0
12 NA 4.0
13  4 4.0

Upvotes: 2

Related Questions