Reputation: 11802
I found this question already asked but without proper answer. R using variable column names in summarise function in dplyr
I want to calculate the difference between two column means, but the column name should be provided by variables... So far I found only the function as.name
to provide column names as text, but this somehow doesn't work here...
With fix column names it works.
x <- c('a','b')
df <- group_by(data.frame(a=c(1,2,3,4), b=c(2,3,4,5), c=c(1,1,2,2)), c)
df %>% summarise(mean(a) - mean(b))
With variable columns, it doesn't work
df %>% summarise(mean(x[1]) - mean(x[2]))
df %>% summarise(mean(as.name(x[1])) - mean(as.name(x[2])))
Since this was asked already 3 years ago and dplyr
is under good development, I am wondering if there is an answer to this now.
Upvotes: 4
Views: 4960
Reputation: 21
This is not a direct answer to your question but maybe could be useful for other people reading your post: It could be easier to use variable columns directly, like
df %>% summarise(someName = mean(.[[1]]) - mean(.[[2]]))
############ which is the same as ############
df %>% summarise(someName = mean(.[,1,drop=T]) - mean(.[,2,drop=T]))
Note that drop=T is because when using just single square bracket the result preserves the class (in this case class( . ) = data.frame) and this isn't what we want (columns must be given in vector form to the summarise function)
Upvotes: 1
Reputation: 47350
You can use base::get
:
df %>% summarise(mean(get(x[1])) - mean(get(x[2])))
# # A tibble: 2 x 2
# c `mean(a) - mean(b)`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
get
will search in current environment by default.
As the error message says, mean
expects a logical or numeric object, as.name
returns a name:
class(as.name("a")) # [1] "name"
You could evaluate your name, that would work as well :
df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
# c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
Upvotes: 9