EllenQ
EllenQ

Reputation: 33

Replacing NA values using a rolling window

How can I replace a NA value by the average of the previous non-NA and next non-NA values? For example, I want to replace the first NA value by -0.873, and the 4th/5th by the average of -0.497+53.200.

Thanks!

t <- c(NA, -0.873, -0.497, NA, NA, 53.200, NA, NA, NA, 26.100)

=================== ADD ON =================== Thank you all for answering the question! Sorry for the late response. This is only a part of a dataframe (10000 * 91) and I only took out the first 10 rows from the first column in order to simplify the question. I think David and MKR have the result that I am expected to have.

Upvotes: 1

Views: 474

Answers (3)

David Arenburg
David Arenburg

Reputation: 92292

Here's a possible vectorized approach using base R (some steps could be probably improved but I have no time to look into it right now)

x <- c(NA, -0.873, -0.497, NA, NA, 53.200, NA, NA, NA, 26.100)

# Store a boolean vector of NA locaiotns for firther use
na_vals <- is.na(x)

# Find the NAs location compaed to the non-NAs
start_ind <- findInterval(which(na_vals), which(!na_vals))

# Createa right limit
end_ind <- start_ind + 1L

# Replace zero locations with NAs
start_ind[start_ind == 0L] <- NA_integer_

# Calculate the means and replace the NAs
x[na_vals] <- rowMeans(cbind(x[!na_vals][start_ind], x[!na_vals][end_ind]), na.rm = TRUE)
x
# [1] -0.8730 -0.8730 -0.4970 26.3515 26.3515 53.2000 39.6500 39.6500 39.6500 26.1000

This should work properly for NAs on both sides of the vector.

Upvotes: 2

MKR
MKR

Reputation: 20095

One dplyr and tidyr based solution could be:

  library(dplyr)
  library(tidyr)
  t <- c(NA, -0.873, -0.497, NA, NA, 53.200, NA, NA, NA, 26.100)

  data.frame(t) %>%
    mutate(last_nonNA = ifelse(!is.na(t), t, NA)) %>%
    mutate(next_nonNA = ifelse(!is.na(t), t, NA)) %>%
    fill(last_nonNA) %>%
    fill(next_nonNA, .direction = "up") %>%
    mutate(t = case_when(
                        !is.na(t)  ~ t,
                        !is.na(last_nonNA) & !is.na(next_nonNA) ~ (last_nonNA + next_nonNA)/2,
                        is.na(last_nonNA) ~ next_nonNA,
                        is.na(next_nonNA) ~ last_nonNA
                        )
           ) %>%
    select(t)

  # t
  # 1  -0.8730
  # 2  -0.8730
  # 3  -0.4970
  # 4  26.3515
  # 5  26.3515
  # 6  53.2000
  # 7  39.6500
  # 8  39.6500
  # 9  39.6500
  # 10 26.1000

Note: It looks a bit complicated but it does the trick. One can achieve same thing via for loop.

Upvotes: 1

De Novo
De Novo

Reputation: 7610

This function imputes values for NA in a vector based on the average of the non-NA values in a rolling window from the first element to the next element.

t <- c(NA, -0.873, -0.497, NA, NA, 53.200, NA, NA, NA, 26.100)

roll_impute <- function(x){
    n <- length(x)
    res <- x
    for (i in seq_along(x)){
        if (is.na(x[i])){
            res[i] <- mean(rep_len(x, i+1), na.rm = TRUE )
        }
    }
    if (is.na(x[n])) x[n] <- mean(x, na.rm = TRUE)
    res
}
roll_impute(t)
# [1] -0.87300 -0.87300 -0.49700 -0.68500 17.27667 53.20000 17.27667 17.27667 19.48250
# [10] 26.10000

roll_impute() includes code that corrects the rolling window in the case that the final element is NA, so that the vector isn't recycled. This isn't the case in your example, but is needed in order to generalize the function. Any improvements on this function would be welcome :) It does use a for loop, but doesn't grow any vectors. No simple way to avoid the for loop and rely on the structure of the objects jumps to my mind right now.

Upvotes: 2

Related Questions