Nicosc
Nicosc

Reputation: 323

Setting multiple values to NA with dplyr

I have a data frame from a survey which has several types of missing values that varies between the columns. In some questions they used only "97", while in other questions they used "98", "99" or "99999" etc. What I want is a fast and simple way to check within each column if they contain one of the missing values types and setting all of them as NA. I found a solution on this website that works with simple columns, but there must be a more efficient way?

Here is an example of my data set containing two different missing values types (98 and 99):

  safety_ensured social_trust approval_gov empl_opp gap_rich_poor
           <dbl>        <dbl>        <dbl>    <dbl>         <dbl>
1              3           98           99       NA             2
2             99           98           99        3            98
3              2           98           99       98            98
4              3           98           99        3             3
5              3           98           99        1            98

I found here a solution using dplyr and a function, but when I do that, it turns my data frame to a list.

is_na <- function(x){
  return(as.character(x) %in% c("96", "97", "98", "99", "99999")) 
}
dataset <- dataset %>%
  lapply(is_na)

Greetings

Upvotes: 4

Views: 1550

Answers (1)

akrun
akrun

Reputation: 887008

We can create a vector of values, then use mutate/across (from dplyr 1.0.0), and replace the values in each of the columns (everything() - to select all column) where it matches the 'vec' (%in%) to NA)

library(dplyr)
vec <- c(96:99, 99999)
dataset %>%
   mutate(across(everything(), ~ replace(., . %in% vec, NA)))

Upvotes: 5

Related Questions