Reputation: 1417
I am re-posting this question due to a pretty serious mistake in my previous reproducible example.
My data look like this:
set.seed(123)
X_foo <- runif(6, 0, 1)
X_bar <- runif(6, 0, 100)
Y_foo <- runif(6, 0, 1)
Y_bar <- runif(6, 0, 100)
Z_foo <- runif(6, 0, 1)
Z_bar <- runif(6, 0, 100)
df <- data.frame(X_foo, X_bar, Y_foo, Y_bar, Z_foo, Z_bar)
df
X_foo X_bar Y_foo Y_bar Z_foo Z_bar
1 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.302423
2 0.7883051 89.24190 0.57263340 95.45036 0.7085305 90.229905
3 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.070528
4 0.8830174 45.66147 0.89982497 69.28034 0.5941420 79.546742
5 0.9404673 95.68333 0.24608773 64.05068 0.2891597 2.461368
6 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.779597
I will be asked to return the top 3 values, ranked, from any one (and only one) of the six variables in the data. The function I wrote to do this is:
aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") {
# list of names that the function will accept
good_metric1 <- c("X", "Y", "Z")
good_metric2 <- c("foo", "bar")
# use an if statement, so if user enters a bad name they get an error back
if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) {
thePull <- df %>%
# Select statement should pull exactly one variable (by default, X_foo)
select(contains(aMetric1)) %>%
select(contains(aMetric2))
} else {
return("Error")
}
theOutput <- thePull %>%
# Create a new variable with the ranks of the variable pulled
mutate(Rank = min_rank()) %>% # This is where the function breaks
# Sort the ranks
arrange(desc(Rank)) %>%
# Filter for ranks 1,2,3
filter(Rank <= 3)
return(theOutput)
}
But when I run aRankingFunction()
, it breaks. I've pointed out where the break happens: I can't figure out what the * should be in the statement mutate(Rank = min_rank(*))
. That statement will be ranking one of the six variables chosen, but I won't know which one until runtime.
How do I tell the mutate
statement, dynamically, "use the variable name that has been chosen"?
Upvotes: 1
Views: 365
Reputation: 206167
Just focusing on the part that needs work, you need to turn the string you have into a symbol then inject that into the dplyr call with the bang-bang !!
operator
...
rankvar <- as.symbol(names(thePull))
theOutput <- thePull %>%
# Create a new variable with the ranks of the variable pulled
mutate(Rank = min_rank(!!rankvar)) %>%
...
Another alternative in this special case where you have only one column is
...
theOutput <- thePull %>%
# Create a new variable with the ranks of the variable pulled
mutate_all(funs(Rank = min_rank)) %>%
...
Upvotes: 3
Reputation: 5405
You can submit thePull
as an argument to min_rank()
aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") {
# list of names that the function will accept
good_metric1 <- c("X", "Y", "Z")
good_metric2 <- c("foo", "bar")
# use an if statement, so if user enters a bad name they get an error back
if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) {
thePull <- df %>%
# Select statement should pull exactly one variable (by default, X_foo)
select(contains(aMetric1)) %>%
select(contains(aMetric2))
} else {
return("Error")
}
theOutput <- df %>%
# Create a new variable with the ranks of the variable pulled
mutate(Rank = min_rank(thePull)) %>% # This is where the function breaks
# Sort the ranks
arrange(desc(Rank)) %>%
# Filter for ranks 1,2,3
filter(Rank <= 3)
return(theOutput)
}
> aRankingFunction()
X_foo X_bar Y_foo Y_bar Z_foo Z_bar Rank
1 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.07053 3
2 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.30242 2
3 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.77960 1
Upvotes: 1