Alternatives to apply same condition to multiple variables inside case_when function

Question

I am trying to find a more efficient or elegant solution to multiple conditioning inside case_when function.

I am creating a dummy column based on multiple conditions across specific columns of a data frame. There are many cases where I use the same is.na() for many columns. I have the correct result, but I have tried other approaches with apply, reduce and anyNa without success.

Let's say this data frame looks like the data I'm working on:

set.seed(12)
dframe <- data.frame(
  x1 = sample(letters[1:2], 10, replace = TRUE),
  x2 = sample(0:1, 10, replace = TRUE),
  x3 = sample(0:2, 10, replace = TRUE),
  x4 = sample(0:2, 10, replace = TRUE),
  x5 = sample(0:2, 10, replace = TRUE),
  x6 = sample(0:2, 10, replace = TRUE)
) %>% 
  mutate_if(is.numeric, list(~na_if(., 2)))

And it looks like this:

   x1 x2 x3 x4 x5 x6
1   b  1 NA  0  0  0
2   b  0  0  0 NA NA
3   b  1  0  0  0  1
4   a  0 NA  1 NA  0
5   a  1  1 NA NA NA
6   b  0 NA  1  1  1
7   a  1  1 NA NA  0
8   a  1  0  1 NA  0
9   b  1 NA NA  0  0
10  b  1  1  0 NA NA

Then, I create the column x7 based on the following conditions:

dframe %>% 
  mutate(
    x7 = case_when(
      x2 == 1 & 
      (!is.na(x3) | !is.na(x4) | !is.na(x5)) & 
      !is.na(x6) ~ 1,
      x2 == 1 ~ 0,
      TRUE ~ NA_real_
    )
  )

resulting in:

   x1 x2 x3 x4 x5 x6 x7
1   b  1 NA  0  0  0  1
2   b  0  0  0 NA NA NA
3   b  1  0  0  0  1  1
4   a  0 NA  1 NA  0 NA
5   a  1  1 NA NA NA  0
6   b  0 NA  1  1  1 NA
7   a  1  1 NA NA  0  1
8   a  1  0  1 NA  0  1
9   b  1 NA NA  0  0  1
10  b  1  1  0 NA NA  0

However, I want to find an alternative to write (!is.na(x3) | !is.na(x4) | !is.na(x5)) because in my real script I have to type this for 11 columns.

I've tried to use complete.cases(x3, x4, x5), but it doesn't follow the logic I'm using in the code.

Using anyNA(x3, x4, x5) throws Error in anyNA(x3, x4, x5) : anyNA takes 1 or 2 arguments.

Also tried the answers of a similar problem, but since I'm not using it for filtering, it didn't work out.

Maybe I'm overthinking it, but what I'm looking for is something without having to use (!is.na(x3) | !is.na(x4) | !is.na(x5)).

Ronak Shah · Accepted Answer

We could use rowSums and specify the columns by name

library(dplyr)

dframe %>% 
  mutate(x7 = case_when(
               x2 == 1 & 
               rowSums(!is.na(.[c("x3","x4","x5")])) > 0 &
               !is.na(x6) ~ 1,
               x2 == 1 ~ 0,
               TRUE ~ NA_real_
              )
          )

Or by position

rowSums(!is.na(.[3:5])) > 0

We could do this using inverted logic as well.

rowSums(is.na(.[c("x3","x4","x5")])) != 3

Or

rowSums(is.na(.[3:5])) != 3

We use 3 here as there are 3 columns to check in the given example (x3, x4 and x5), you can change the number based on your actual number of columns (11).

Alternatives to apply same condition to multiple variables inside case_when function

Answers (1)

Related Questions