Reputation: 55
I have a data frame which looks like this:
a b c d
10 yes yes yes yes
11 yes yes yes yes
12 yes yes yes yes
13 yes yes yes yes
14 no <NA> no no
15 no <NA> no no
16 no <NA> no no
17 no <NA> no no
18 no <NA> no no
19 no <NA> no no
20 no <NA> no no
I have an if else statement which creates a new column with values 1,0 based on if the answers to all the previous columns are yes or no. However my code does not account for NA's. This is the code I have used:
y <- x %>%
mutate(
health_ever = ifelse(
e == 'yes ' |
b == 'yes' |
c == 'yes' |
d == 'yes',
1,
0
)
)
Here is the code to reproduce it:
x<-structure(
list(
a = structure(
c(6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L),
.Label = c(
"missing",
"inapplicable",
"proxy respondent ",
"refusal",
"don't know",
"yes ",
"no "
),
class = "factor"
),
b = structure(
c(6L, 6L, 6L, 6L, NA, NA, NA, NA, NA,
NA, NA),
.Label = c(
"missing",
"inapplicable",
"proxy",
"refusal",
"don't know",
"yes",
"no"
),
class = "factor"
),
c = structure(
c(6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L),
.Label = c(
"missing",
"inapplicable",
"proxy",
"refusal",
"don't know",
"yes",
"no"
),
class = "factor"
),
d = structure(
c(6L, 6L,
6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L),
.Label = c(
"missing",
"inapplicable",
"proxy",
"refusal",
"don't know",
"yes",
"no"
),
class = "factor"
)
),
row.names = 10:20,
class = "data.frame"
)
How can I change my code to overlook any NAs to still give 1,0 based on the other columns. This is my desired output:
a b c d e
1 yes yes yes yes 1
2 yes yes yes yes 1
3 yes yes yes yes 1
4 yes yes yes yes 1
5 no <NA> no no 0
6 no <NA> no no 0
7 no <NA> no no 0
8 no <NA> no no 0
Upvotes: 2
Views: 115
Reputation: 886928
Using rowSums
on a logical matrix can return the counts of number of NA
in each row. If it returns 0, it means no NA
in that row. This can be converted to logical by negating (!
) to change the 0 to TRUE and all other values to FALSE. Then with as.integer
or +
coerce it to binary i.e. TRUE => 1
and FALSE => 0
x$e <- +(!rowSums(is.na(x)))
Based on the OP's code, it is checking for the 'yes' values, that can be also done with rowSums
x$e <- +(rowSums(x == 'yes', na.rm = TRUE) > 0)
i.e. count the 'yes' values in each row, removing the NA
with na.rm = TRUE
, convert to logical by checking if the count is greater than 0 and coerce it to binary with +
If we want to check if all the columns to be 'yes'
x$e <- +(rowSums(x == 'yes', na.rm = TRUE) == ncol(x))
-output
x
# a b c d e
#10 yes yes yes yes 1
#11 yes yes yes yes 1
#12 yes yes yes yes 1
#13 yes yes yes yes 1
#14 no <NA> no no 0
#15 no <NA> no no 0
#16 no <NA> no no 0
#17 no <NA> no no 0
#18 no <NA> no no 0
#19 no <NA> no no 0
#20 no <NA> no no 0
In the OP's code, there is a leading space in e == 'yes '
and the 'e' is not a column in the initial dataset. Perhaps 'a'
Upvotes: 1