Reputation: 63
I am trying to find a more efficient or elegant solution to multiple conditioning inside case_when
function.
I am creating a dummy column based on multiple conditions across specific columns of a data frame. There are many cases where I use the same is.na()
for many columns. I have the correct result, but I have tried other approaches with apply
, reduce
and anyNa
without success.
Let's say this data frame looks like the data I'm working on:
set.seed(12)
dframe <- data.frame(
x1 = sample(letters[1:2], 10, replace = TRUE),
x2 = sample(0:1, 10, replace = TRUE),
x3 = sample(0:2, 10, replace = TRUE),
x4 = sample(0:2, 10, replace = TRUE),
x5 = sample(0:2, 10, replace = TRUE),
x6 = sample(0:2, 10, replace = TRUE)
) %>%
mutate_if(is.numeric, list(~na_if(., 2)))
And it looks like this:
x1 x2 x3 x4 x5 x6
1 b 1 NA 0 0 0
2 b 0 0 0 NA NA
3 b 1 0 0 0 1
4 a 0 NA 1 NA 0
5 a 1 1 NA NA NA
6 b 0 NA 1 1 1
7 a 1 1 NA NA 0
8 a 1 0 1 NA 0
9 b 1 NA NA 0 0
10 b 1 1 0 NA NA
Then, I create the column x7
based on the following conditions:
dframe %>%
mutate(
x7 = case_when(
x2 == 1 &
(!is.na(x3) | !is.na(x4) | !is.na(x5)) &
!is.na(x6) ~ 1,
x2 == 1 ~ 0,
TRUE ~ NA_real_
)
)
resulting in:
x1 x2 x3 x4 x5 x6 x7
1 b 1 NA 0 0 0 1
2 b 0 0 0 NA NA NA
3 b 1 0 0 0 1 1
4 a 0 NA 1 NA 0 NA
5 a 1 1 NA NA NA 0
6 b 0 NA 1 1 1 NA
7 a 1 1 NA NA 0 1
8 a 1 0 1 NA 0 1
9 b 1 NA NA 0 0 1
10 b 1 1 0 NA NA 0
However, I want to find an alternative to write (!is.na(x3) | !is.na(x4) | !is.na(x5))
because in my real script I have to type this for 11 columns.
I've tried to use complete.cases(x3, x4, x5)
, but it doesn't follow the logic I'm using in the code.
Using anyNA(x3, x4, x5)
throws Error in anyNA(x3, x4, x5) : anyNA takes 1 or 2 arguments
.
Also tried the answers of a similar problem, but since I'm not using it for filtering, it didn't work out.
Maybe I'm overthinking it, but what I'm looking for is something without having to use (!is.na(x3) | !is.na(x4) | !is.na(x5))
.
Upvotes: 2
Views: 178
Reputation: 388982
We could use rowSums
and specify the columns by name
library(dplyr)
dframe %>%
mutate(x7 = case_when(
x2 == 1 &
rowSums(!is.na(.[c("x3","x4","x5")])) > 0 &
!is.na(x6) ~ 1,
x2 == 1 ~ 0,
TRUE ~ NA_real_
)
)
Or by position
rowSums(!is.na(.[3:5])) > 0
We could do this using inverted logic as well.
rowSums(is.na(.[c("x3","x4","x5")])) != 3
Or
rowSums(is.na(.[3:5])) != 3
We use 3 here as there are 3 columns to check in the given example (x3
, x4
and x5
), you can change the number based on your actual number of columns (11).
Upvotes: 1