Reputation: 1030
I am trying to replace NAs in a data.frame of many columns using another vector in which the replacement values for each column are given. I know how I could replace each value using a function, but not to find the value in another vector. I am searching for a dplyr approach:
For example:
require(dplyr)
test <- data.frame(A = c(1,2,3,NA), B = c(4,5,NA,2), C = c(NA,2,2,NA), D = c(1,2,3,4))
replace_na <- c(A = 100, B = 200, C = 300)
# Replace with median should be replace with look up value in vector based on the name of the vector or position
test %>% mutate_each_(funs(replace(., is.na(.), median(.,na.rm = T))), names(replace_na))
expected_result <- data.frame(A = c(1,2,3,100), B = c(4,5,200,2), C = c(300,2,2,300), D = c(1,2,3,4))
> expected_result
A B C D
1 1 4 300 1
2 2 5 2 2
3 3 200 2 3
4 100 2 300 4
Upvotes: 2
Views: 1721
Reputation: 4187
It is as easy as using replace_na
function from tidyr
-package:
library(tidyr)
test %>% replace_na(as.list(replacements))
The output:
A B C D
1 1 4 300 1
2 2 5 2 2
3 3 200 2 3
4 100 2 300 4
This function needs a list for which columns the NA's you want to replace. So, it is possible to replace for only selected columns. Example:
replacements2 <- list(B = 200, C = 300)
test %>% replace_na(replacements2)
output:
A B C D
1 1 4 300 1
2 2 5 2 2
3 3 200 2 3
4 NA 2 300 4
As you can see, only the NA's for the B and C columns are replaced.
Data:
test <- data.frame(A = c(1,2,3,NA), B = c(4,5,NA,2), C = c(NA,2,2,NA), D = c(1,2,3,4))
replacements <- c(A = 100, B = 200, C = 300)
Upvotes: 4
Reputation: 886938
We can use Map
from base R
test[names(replace_na)] <- Map(function(x,y)
replace(x, is.na(x), y), test[names(replace_na)], replace_na)
test
# A B C D
#1 1 4 300 1
#2 2 5 2 2
#3 3 200 2 3
#4 100 2 300 4
Or with tidyverse
library(tidyverse)
test %>%
select_at(names(replace_na)) %>%
map2_df(., replace_na, ~replace(., is.na(.), .y)) %>%
bind_cols(., select_at(test, setdiff(names(test), names(replace_na))))
# A tibble: 4 x 4
# A B C D
# <dbl> <dbl> <dbl> <dbl>
#1 1 4 300 1
#2 2 5 2 2
#3 3 200 2 3
#4 100 2 300 4
Or with set
from data.table
library(data.table)
setDT(test)
for(j in names(replace_na)){
set(test, i = which(is.na(test[[j]])), j = j, value = replace_na[j])
}
test
# A B C D
#1: 1 4 300 1
#2: 2 5 2 2
#3: 3 200 2 3
#4: 100 2 300 4
Upvotes: 1