Reputation: 1904
I am trying to supply a vector that contains multiple column names to a mutate()
call using the dplyr
package. Reproducible example below:
stackdf <- data.frame(jack = c(1,NA,2,NA,3,NA,4,NA,5,NA),
jill = c(1,2,NA,3,4,NA,5,6,NA,7),
jane = c(1,2,3,4,5,6,NA,NA,NA,NA))
two_names <- c('jack','jill')
one_name <- c('jack')
# jack jill jane
# 1 1 1
# NA 2 2
# 2 NA 3
# NA 3 4
# 3 4 5
# NA NA 6
# 4 5 NA
# NA 6 NA
# 5 NA NA
# NA 7 NA
I am able to figure out how to use the "one variable" versions, but do not know how to extend this to multiple variables?
# the below works as expected, and is an example of the output I desire
stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))
# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE
# using the one_name variable works if I evaluate it and then convert to
# a name before unquoting it
stackdf %>% rowwise %>% mutate(test = anyNA(!!as.name(eval(one_name))))
# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 FALSE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA FALSE
10 NA 7 NA TRUE
How can I extend the above approach so that I could use the two_names
vector? Using as.name
only takes a single object so it does not work.
This question here is similar: Pass a vector of variable names to arrange() in dplyr. That solution "works" in that I can use the below code:
two_names2 <- quos(c(jack, jill))
stackdf %>% rowwise %>% mutate(test = anyNA(!!!two_names2))
But it defeats the purpose if I have to type c(jack, jill)
directly rather than using the two_names
variable. Is there some similar procedure where I can use two_names
directly? This answer How to pass a named vector to dplyr::select using quosures? uses rlang::syms
but though this works for selecting variables (ie stackdf %>% select(!!! rlang::syms(two_names))
it does not seem to work for supplying arguments when mutating (ie stackdf %>% rowwise %>% mutate(test = anyNA(!!! rlang::syms(two_names)))
. This answer is similar but does not work: How to evaluate a constructed string with non-standard evaluation using dplyr?
Upvotes: 3
Views: 1392
Reputation: 43334
You can use rlang::syms
(which is reexported by dplyr; alternately call it directly) to coerce strings to quosures, so
library(dplyr)
stackdf <- data.frame(jack = c(1,NA,2,NA,3,NA,4,NA,5,NA),
jill = c(1,2,NA,3,4,NA,5,6,NA,7),
jane = c(1,2,3,4,5,6,NA,NA,NA,NA))
two_names <- c('jack','jill')
stackdf %>% rowwise %>% mutate(test = anyNA(c(!!!syms(two_names))))
#> Source: local data frame [10 x 4]
#> Groups: <by row>
#>
#> # A tibble: 10 x 4
#> jack jill jane test
#> <dbl> <dbl> <dbl> <lgl>
#> 1 1. 1. 1. FALSE
#> 2 NA 2. 2. TRUE
#> 3 2. NA 3. TRUE
#> 4 NA 3. 4. TRUE
#> 5 3. 4. 5. FALSE
#> 6 NA NA 6. TRUE
#> 7 4. 5. NA FALSE
#> 8 NA 6. NA TRUE
#> 9 5. NA NA TRUE
#> 10 NA 7. NA TRUE
Alternatively, using a little base R instead of tidy eval:
stackdf %>% mutate(test = rowSums(is.na(.[two_names])) > 0)
#> jack jill jane test
#> 1 1 1 1 FALSE
#> 2 NA 2 2 TRUE
#> 3 2 NA 3 TRUE
#> 4 NA 3 4 TRUE
#> 5 3 4 5 FALSE
#> 6 NA NA 6 TRUE
#> 7 4 5 NA FALSE
#> 8 NA 6 NA TRUE
#> 9 5 NA NA TRUE
#> 10 NA 7 NA TRUE
...which will likely be a lot faster, as iterating rowwise
makes n
calls instead of one vectorized one.
Upvotes: 7
Reputation: 1904
There are several keys to solving this question:
dplyr
mutate
, here the anyNA
The goal here is to replicate this call, but using the named variable two_names
instead of manually typing out c(jack,jill)
.
stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))
# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE
1. Using dynamic variables with dplyr
Using quo
/quos
: Does not accept strings as input. The solution using this method would be:
two_names2 <- quos(c(jack, jill))
stackdf %>% rowwise %>% mutate(test = anyNA(!!! two_names2))
Note that quo
takes a single argument, and thus is unquoted using !!
, and for multiple arguments you can use quos
and !!!
respectively. This is not desirable because I do not use two_names
and instead have to type out the columns I wish to use.
Using as.name
or rlang::sym
/rlang::syms
: as.name
and sym
take only a single input, however syms
will take multiple and return a list of symbolic objects as output.
> two_names
[1] "jack" "jill"
> as.name(two_names)
jack
> syms(two_names)
[[1]]
jack
[[2]]
jill
Note that as.name
ignores everything after the first element. However, syms
appears to work appropriately here, so now we need to use this within the mutate
call.
2. Using dynamic variables within mutate
using anyNA
or other variables
Using syms
and anyNA
directly does not actually produce the correct result.
> stackdf %>% rowwise %>% mutate(test = anyNA(!!! syms(two_names)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 FALSE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA FALSE
10 NA 7 NA TRUE
Inspection of the test
shows that this is only taking into account the first element, and ignoring the second element. However, if I use a different function, eg sum
or paste0
, it is clear that both elements are being used:
> stackdf %>% rowwise %>% mutate(test = sum(!!! syms(two_names),
na.rm = TRUE))
jack jill jane test
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 2
2 NA 2 2 2
3 2 NA 3 2
4 NA 3 4 3
5 3 4 5 7
6 NA NA 6 0
7 4 5 NA 9
8 NA 6 NA 6
9 5 NA NA 5
10 NA 7 NA 7
The reason for this becomes clear when you look at the arguments for anyNA
vs sum
.
function (x, recursive = FALSE) .Primitive("anyNA")
function (..., na.rm = FALSE) .Primitive("sum")
anyNA
expects a single object x
, whereas sum
can take a variable list of objects (...)
.
Simply supplying c()
fixes this problem (see answer from alistaire).
> stackdf %>% rowwise %>% mutate(test = anyNA(c(!!! syms(two_names))))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE
Alternately... for educational purposes, one could use a combination of sapply
, any
, and anyNA
to produce the correct result. Here we use list
so that the results are provided as a single list object.
# this produces an error an error because the elements of !!!
# are being passed to the arguments of sapply (X =, FUN = )
> stackdf %>% rowwise %>%
mutate(test = any(sapply(!!! syms(two_names), anyNA)))
Error in mutate_impl(.data, dots) :
Evaluation error: object 'jill' of mode 'function' was not found.
Supplying list
fixes this problem because it binds all the results into a single object.
# the below table is the familiar incorrect result that uses only the `jack`
> stackdf %>% rowwise %>%
mutate(test = any(sapply(X=as.list(!!! syms(two_names)),
FUN=anyNA)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 FALSE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA FALSE
10 NA 7 NA TRUE
# this produces the correct answer
> stackdf %>% rowwise %>%
mutate(test = any(X = sapply(list(!!! syms(two_names)),
FUN = anyNA)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE
Understanding why these two perform differently make sense when their behavior is compared!
> as.list(two_names)
[[1]]
[1] "jack"
[[2]]
[1] "jill"
> list(two_names)
[[1]]
[1] "jack" "jill"
Upvotes: 6