Tobias Dekker
Tobias Dekker

Reputation: 1030

Replace NA each column based on another vector using dplyr

I am trying to replace NAs in a data.frame of many columns using another vector in which the replacement values for each column are given. I know how I could replace each value using a function, but not to find the value in another vector. I am searching for a dplyr approach:

For example:

require(dplyr)
test <- data.frame(A = c(1,2,3,NA), B = c(4,5,NA,2), C = c(NA,2,2,NA), D = c(1,2,3,4))
replace_na <- c(A = 100, B = 200, C = 300)
# Replace with median should be replace with look up value in vector based on the name of the vector or position
test %>% mutate_each_(funs(replace(., is.na(.), median(.,na.rm = T))), names(replace_na))
expected_result <- data.frame(A = c(1,2,3,100), B = c(4,5,200,2), C = c(300,2,2,300), D = c(1,2,3,4))
> expected_result
    A   B   C  D
1   1   4 300  1
2   2   5   2  2
3   3 200   2  3
4 100   2 300  4

Upvotes: 2

Views: 1721

Answers (2)

h3rm4n
h3rm4n

Reputation: 4187

It is as easy as using replace_na function from tidyr-package:

library(tidyr)
test %>% replace_na(as.list(replacements))

The output:

    A   B   C D
1   1   4 300 1
2   2   5   2 2
3   3 200   2 3
4 100   2 300 4

This function needs a list for which columns the NA's you want to replace. So, it is possible to replace for only selected columns. Example:

replacements2 <- list(B = 200, C = 300)
test %>% replace_na(replacements2)

output:

   A   B   C D
1  1   4 300 1
2  2   5   2 2
3  3 200   2 3
4 NA   2 300 4

As you can see, only the NA's for the B and C columns are replaced.

Data:

test <- data.frame(A = c(1,2,3,NA), B = c(4,5,NA,2), C = c(NA,2,2,NA), D = c(1,2,3,4))
replacements <- c(A = 100, B = 200, C = 300)

Upvotes: 4

akrun
akrun

Reputation: 886938

We can use Map from base R

test[names(replace_na)] <- Map(function(x,y) 
      replace(x, is.na(x), y), test[names(replace_na)], replace_na)
test
#    A   B   C D
#1   1   4 300 1
#2   2   5   2 2
#3   3 200   2 3
#4 100   2 300 4

Or with tidyverse

library(tidyverse)
test %>% 
   select_at(names(replace_na)) %>%
   map2_df(., replace_na, ~replace(., is.na(.), .y)) %>%
   bind_cols(., select_at(test, setdiff(names(test), names(replace_na))))
# A tibble: 4 x 4
#      A     B     C     D
#  <dbl> <dbl> <dbl> <dbl>
#1     1     4   300     1
#2     2     5     2     2
#3     3   200     2     3
#4   100     2   300     4

Or with set from data.table

library(data.table)
setDT(test)
for(j in names(replace_na)){
  set(test, i = which(is.na(test[[j]])), j = j, value = replace_na[j])
 }

test
#     A   B   C D
#1:   1   4 300 1
#2:   2   5   2 2
#3:   3 200   2 3
#4: 100   2 300 4

Upvotes: 1

Related Questions