melbez
melbez

Reputation: 1000

Replace values with new value if a condition is met or keep value the same if not, in R

I am using a dataset where the missing values for variables are specified with specific numbers. I am trying to create one dataframe where I replace these values with blanks and another dataframe where I replace them with NA's. For this question, I will focus on the dataframe where they are replaced with NA's.

For the variables, missing values are specified by the numbers 8 or 9. I feel like I could use mutate_at() to change all of them or possibly an apply() function, but I am open to any suggestions. The general logic I am trying to write is: for each specified column, if the value is 8 or 9, replace with blank, else keep the value the same.

The dataset is structured so that each column represents one variable. I am trying to select a subset of the variables from the dataframe since only a few columns have missing values. I have looked at this example, but it doesn't completely answer my question.

I know I could do something like this, but it would require me specifying the values of all the other values non-missing values in the dataframe. I would prefer a solution where I can specify what happens to 8's and 9's (the missing values) and can keep the others the same without listing them out.

mutate_at(vars(card, lung, diabetes), function(x) case_when (x == 8 ~ "NA", x == 9 ~ "NA", x == 6 ~ 6, x == 4 ~ 4, x == 3 ~ 3, x == 2 ~ 2, x == 1 ~ 1))

Upvotes: 2

Views: 3842

Answers (3)

efz
efz

Reputation: 435

in one simple line

apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)})

thus:

your.data.frame <- matrix(c(12,3,4,5,6,78,8,11,8,9, 2,45,65.6,6,7,8,9,12 ), ncol=3)
new.data.frame <- t(apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)}))
new.data.frame     
[,1] [,2] [,3]
[1,] 12.0    3    4
[2,]  5.0    6   78
[3,]   NA   11   NA
[4,]   NA    2   45
[5,] 65.6    6    7
[6,]   NA   NA   12

Upvotes: 0

akrun
akrun

Reputation: 886938

Here, we need

library(dplyr)
df1 %>%
      mutate_at(vars(card, lung, diabetes), ~ replace(., . %in% 8:9, NA))
#   card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

Or if we use case_when by default the TRUE is NA, so the condition can be

df1 %>%
      mutate_at(vars(card, lung, diabetes),  ~ case_when(! . %in% 8:9 ~ .))
#   card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

Or another option is na_if

df1 %>%
    mutate_at(vars(card, lung, diabetes), ~ na_if(., 8) %>% na_if(.,9)) 
#  card lung diabetes val
#1   NA    1        1   1
#2   NA    3        4   2
#3    1   NA        3   3
#4    2   NA        5   4
#5    3   NA       NA   5

data

df1 <- data.frame(card = c(8, 9, 1, 2, 3), lung = c(1, 3, 8, 9, 8),
     diabetes = c(1, 4, 3, 5, 8), val = 1:5)

Upvotes: 3

Thomas Rosa
Thomas Rosa

Reputation: 732

In base R:

cols = c('card', 'lung', 'diabetes')
temp = df[, cols]
temp[temp == 8 | temp == 9] = NA
df[, cols] = temp

Upvotes: 0

Related Questions