Reputation: 521
Within R, I use dplyr
and more specifically arrange()
.
Somehow the arrange
function doesn't work as expected.
In the example below first I store the name of a column, then I pass this variable as a parameter to a custom function called 'my_function'.
target_column = 'mean_age'
# below the function
my_function <- function(target_column, number){
df <- read.csv('file.csv', stringsAsFactors=FALSE)
df <- df[, c(1,4,10)]
names(df) <- c('place','state','mean_age')
df1 <- df %>% group_by(state) %>% arrange(target_column)
df1 %>% summarise(rank = nth(target_column, number))
}
R returns an error when 'my_function' is called due to the input to arrange()
:
"Error in arrange_impl(.data, dots) : incorrect size (1) at position 1, expecting : 4000"
When the name of the column is put directly into arrange()
, instead of a variable that references to a string (like example above), it does accept the parameter.
df %>% group_by(state) %>% arrange(mean_age)
How can I pass the parameter for the column name in a better way to 'my_function', so arrange()
will recognize it?
Upvotes: 5
Views: 6026
Reputation: 3700
2022/03/17 The tidyverse has evolved and so should this answer. The tidy eval functions equo/unquo, sym/ensym, etc. are no longer the commended approach.
library("tidyverse")
# Simulate data
read_df <- function(n = 100) {
set.seed(1234)
tibble(
state = sample(c("A", "B", "C"), n, replace = TRUE),
mean_age = rnorm(n)
)
}
Case 1. If the target column is given as a string, use the .data pronoun, ie, .data[[column_name]])
.
my_function <- function(column_name, number) {
read_df() %>%
group_by(state) %>%
arrange(
# Use `across(all_of())` instead of `across()` even with a single column
# Otherwise will get the following warning:
# > Using an external vector in selections is ambiguous
across(all_of(column_name))
) %>%
summarise(
rank = nth(.data[[column_name]], number)
)
}
my_function("mean_age", 10)
#> # A tibble: 3 × 2
#> state rank
#> <chr> <dbl>
#> 1 A -0.420
#> 2 B -0.584
#> 3 C -0.141
Case 2. If the target column is given as a variable, there is no need for enquo
anymore! Instead enclose tidy-select expressions in double braces {{ }}
, aka embrace it.
my_function <- function(column_var, number) {
read_df() %>%
group_by(state) %>%
arrange(
{{ column_var }}
) %>%
summarise(
rank = nth({{ column_var }}, number)
)
}
my_function(mean_age, 10)
#> # A tibble: 3 × 2
#> state rank
#> <chr> <dbl>
#> 1 A -0.420
#> 2 B -0.584
#> 3 C -0.141
Created on 2022-03-17 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 7151
An update is necessary to the good answer by @avid_useR because 'rlang::parse_quosure' is deprecated now.
To give a short answer to the question how to make 'dplyr::arrange' accept a string or variable containing a string for the column name to sort, you can do:
target_column = rlang::sym('mean_age')
df %>% group_by(state) %>% arrange(!!target_column)
OR as one-liner (if you only need to use it once):
df %>% group_by(state) %>% arrange(!!rlang::sym(target_column))
Upvotes: 4
Reputation: 18681
You need to first parse your string argument to a quosure, then unquote it with !!
:
library(dplyr)
library(rlang)
target_column = 'mean_age'
my_function <- function(target_column, number){
target_quo = parse_quosure(target_column)
df <- read.csv('file.csv', stringsAsFactors=FALSE)
df <- df[, c(1,4,10)]
names(df) <- c('place','state','mean_age')
df1 <- df %>% group_by(state) %>% arrange(!!target_quo)
df1 %>% summarise(rank = nth(target_column, number))
}
my_function('mean_age', 10)
If you want to be able to supply target_column
as an unquoted column name, you can use enquo
instead:
my_function <- function(target_column, number){
target_quo = enquo(target_column)
df <- read.csv('file.csv', stringsAsFactors=FALSE)
df <- df[, c(1,4,10)]
names(df) <- c('place','state','mean_age')
df1 <- df %>% group_by(state) %>% arrange(!!target_quo)
df1 %>% summarise(rank = nth(target_column, number))
}
my_function(mean_age, 10)
Note:
Normally, enquo
will also work for string arguments, but arrange
itself does not allow it, so the following does not work for the second example:
my_function('mean_age', 10)
Below is a toy example to demonstrate what I mean, since OP's question is not reproducible:
library(dplyr)
library(rlang)
test_func = function(var){
var_quo = parse_quosure(var)
mtcars %>%
select(!!var_quo) %>%
arrange(!!var_quo)
}
test_func2 = function(var){
var_quo = enquo(var)
mtcars %>%
select(!!var_quo) %>%
arrange(!!var_quo)
}
Results:
> test_func("mpg") %>%
+ head()
mpg
1 10.4
2 10.4
3 13.3
4 14.3
5 14.7
6 15.0
> test_func2(mpg) %>%
+ head()
mpg
1 10.4
2 10.4
3 13.3
4 14.3
5 14.7
6 15.0
> test_func2("mpg") %>%
+ head()
Error in arrange_impl(.data, dots) : incorrect size (1) at position 1, expecting : 32
Upvotes: 5