Reputation: 25
I am analyzing some variables for some specific locations. These variables have NA values that should be replaced by the values from adjacent columns. I found a way to do it, but it's not an efficient way if I have more columns. Can you help?
Dataset:
location <- rep(c("A", "B", "C"), times = 2)
v1 <- c(11,92,NA,NA,NA,NA)
v2 <- c(NA,NA,NA,50,NA,NA)
v3 <- c(NA,NA,66,NA,NA,79)
v4 <- c(NA,NA,NA,74,23,88)
df <- data.frame(location,v1,v2,v3,v4)
I tried this approach to create a new column in which NA values are replaced by values from other columns.
library (dplyr)
col_1 <- df %>% mutate(new_col = v1 %>% is.na %>% ifelse(v2,v1))
col_2 <- col_1 %>% mutate(new_col_1 = new_col %>% is.na %>% ifelse(v3,new_col))
col_3 <- col_2 %>% mutate(final_col = new_col_1 %>% is.na %>% ifelse(v4,new_col_1))
It solves the problem but I have two questions: 1. Is there any efficient way to do this, instead of creating three columns? 2. For some cases in v3 and v4 OR v2 and v4, where more than one value is available to replace NA, can I take the mean of those values to replace? How? Thanks in advance.
Upvotes: 1
Views: 179
Reputation: 887881
We can use coalesce
library(dplyr)
library(purrr)
df %>%
mutate(new = coalesce(!!! rlang::syms(names(.)[-1])))
Or
df %>%
mutate(new = reduce(.[-1], coalesce))
# location v1 v2 v3 v4 new
#1 A 11 NA NA NA 11
#2 B 92 NA NA NA 92
#3 C NA NA 66 NA 66
#4 A NA 50 NA 74 50
#5 B NA NA NA 23 23
#6 C NA NA 79 88 79
Upvotes: 1