Reputation: 1000
I am using a dataset where the missing values for variables are specified with specific numbers. I am trying to create one dataframe where I replace these values with blanks and another dataframe where I replace them with NA's. For this question, I will focus on the dataframe where they are replaced with NA's.
For the variables, missing values are specified by the numbers 8 or 9. I feel like I could use mutate_at() to change all of them or possibly an apply() function, but I am open to any suggestions. The general logic I am trying to write is: for each specified column, if the value is 8 or 9, replace with blank, else keep the value the same.
The dataset is structured so that each column represents one variable. I am trying to select a subset of the variables from the dataframe since only a few columns have missing values. I have looked at this example, but it doesn't completely answer my question.
I know I could do something like this, but it would require me specifying the values of all the other values non-missing values in the dataframe. I would prefer a solution where I can specify what happens to 8's and 9's (the missing values) and can keep the others the same without listing them out.
mutate_at(vars(card, lung, diabetes), function(x) case_when (x == 8 ~ "NA", x == 9 ~ "NA", x == 6 ~ 6, x == 4 ~ 4, x == 3 ~ 3, x == 2 ~ 2, x == 1 ~ 1))
Upvotes: 2
Views: 3842
Reputation: 435
in one simple line
apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)})
thus:
your.data.frame <- matrix(c(12,3,4,5,6,78,8,11,8,9, 2,45,65.6,6,7,8,9,12 ), ncol=3)
new.data.frame <- t(apply(your.data.frame, 1, function(x){ifelse(x==8| x==9, NA,x)}))
new.data.frame
[,1] [,2] [,3]
[1,] 12.0 3 4
[2,] 5.0 6 78
[3,] NA 11 NA
[4,] NA 2 45
[5,] 65.6 6 7
[6,] NA NA 12
Upvotes: 0
Reputation: 886938
Here, we need
library(dplyr)
df1 %>%
mutate_at(vars(card, lung, diabetes), ~ replace(., . %in% 8:9, NA))
# card lung diabetes val
#1 NA 1 1 1
#2 NA 3 4 2
#3 1 NA 3 3
#4 2 NA 5 4
#5 3 NA NA 5
Or if we use case_when
by default the TRUE
is NA
, so the condition can be
df1 %>%
mutate_at(vars(card, lung, diabetes), ~ case_when(! . %in% 8:9 ~ .))
# card lung diabetes val
#1 NA 1 1 1
#2 NA 3 4 2
#3 1 NA 3 3
#4 2 NA 5 4
#5 3 NA NA 5
Or another option is na_if
df1 %>%
mutate_at(vars(card, lung, diabetes), ~ na_if(., 8) %>% na_if(.,9))
# card lung diabetes val
#1 NA 1 1 1
#2 NA 3 4 2
#3 1 NA 3 3
#4 2 NA 5 4
#5 3 NA NA 5
df1 <- data.frame(card = c(8, 9, 1, 2, 3), lung = c(1, 3, 8, 9, 8),
diabetes = c(1, 4, 3, 5, 8), val = 1:5)
Upvotes: 3
Reputation: 732
In base R:
cols = c('card', 'lung', 'diabetes')
temp = df[, cols]
temp[temp == 8 | temp == 9] = NA
df[, cols] = temp
Upvotes: 0