H.B
H.B

Reputation: 55

How can I ignore NA's across multiple columns in an if else statement in R?

I have a data frame which looks like this:

     a    b   c   d
10 yes      yes yes yes
11 yes      yes yes yes
12 yes      yes yes yes
13 yes      yes yes yes
14 no      <NA>  no  no
15 no      <NA>  no  no
16 no      <NA>  no  no
17 no      <NA>  no  no
18 no      <NA>  no  no
19 no      <NA>  no  no
20 no      <NA>  no  no

I have an if else statement which creates a new column with values 1,0 based on if the answers to all the previous columns are yes or no. However my code does not account for NA's. This is the code I have used:

y <- x %>%
  mutate(
    health_ever = ifelse(
      e == 'yes    ' |
        b == 'yes' |
        c == 'yes' |
        d == 'yes',
      1,
      0
    )
  )

Here is the code to reproduce it:

x<-structure(
  list(
    a = structure(
      c(6L, 6L, 6L, 6L, 7L, 7L,
        7L, 7L, 7L, 7L, 7L),
      .Label = c(
        "missing",
        "inapplicable",
        "proxy respondent       ",
        "refusal",
        "don't know",
        "yes    ",
        "no     "
      ),
      class = "factor"
    ),
    b = structure(
      c(6L, 6L, 6L, 6L, NA, NA, NA, NA, NA,
        NA, NA),
      .Label = c(
        "missing",
        "inapplicable",
        "proxy",
        "refusal",
        "don't know",
        "yes",
        "no"
      ),
      class = "factor"
    ),
    c = structure(
      c(6L,
        6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L),
      .Label = c(
        "missing",
        "inapplicable",
        "proxy",
        "refusal",
        "don't know",
        "yes",
        "no"
      ),
      class = "factor"
    ),
    d = structure(
      c(6L, 6L,
        6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L),
      .Label = c(
        "missing",
        "inapplicable",
        "proxy",
        "refusal",
        "don't know",
        "yes",
        "no"
      ),
      class = "factor"
    )
  ),
  row.names = 10:20,
  class = "data.frame"
)

How can I change my code to overlook any NAs to still give 1,0 based on the other columns. This is my desired output:

     a            b        c        d            e
1   yes          yes      yes      yes           1
2   yes          yes      yes      yes           1
3   yes          yes      yes      yes           1
4   yes          yes      yes      yes           1
5   no          <NA>       no       no           0
6   no          <NA>       no       no           0
7   no          <NA>       no       no           0
8   no          <NA>       no       no           0

Upvotes: 2

Views: 115

Answers (1)

akrun
akrun

Reputation: 886928

Using rowSums on a logical matrix can return the counts of number of NA in each row. If it returns 0, it means no NA in that row. This can be converted to logical by negating (!) to change the 0 to TRUE and all other values to FALSE. Then with as.integer or + coerce it to binary i.e. TRUE => 1 and FALSE => 0

x$e <- +(!rowSums(is.na(x)))

Based on the OP's code, it is checking for the 'yes' values, that can be also done with rowSums

x$e <- +(rowSums(x == 'yes', na.rm = TRUE) > 0)

i.e. count the 'yes' values in each row, removing the NA with na.rm = TRUE, convert to logical by checking if the count is greater than 0 and coerce it to binary with +

If we want to check if all the columns to be 'yes'

x$e <- +(rowSums(x == 'yes', na.rm = TRUE) == ncol(x))

 

-output

x
#         a    b   c   d e
#10 yes      yes yes yes 1
#11 yes      yes yes yes 1
#12 yes      yes yes yes 1
#13 yes      yes yes yes 1
#14 no      <NA>  no  no 0
#15 no      <NA>  no  no 0
#16 no      <NA>  no  no 0
#17 no      <NA>  no  no 0
#18 no      <NA>  no  no 0
#19 no      <NA>  no  no 0
#20 no      <NA>  no  no 0

In the OP's code, there is a leading space in e == 'yes ' and the 'e' is not a column in the initial dataset. Perhaps 'a'

Upvotes: 1

Related Questions