mmyoung77
mmyoung77

Reputation: 1417

Handling dynamic variable names in dplyr

I am re-posting this question due to a pretty serious mistake in my previous reproducible example.

My data look like this:

set.seed(123)
X_foo <- runif(6, 0, 1)
X_bar <- runif(6, 0, 100) 
Y_foo <- runif(6, 0, 1) 
Y_bar <- runif(6, 0, 100)
Z_foo <- runif(6, 0, 1)
Z_bar <- runif(6, 0, 100)
df <- data.frame(X_foo, X_bar, Y_foo, Y_bar, Z_foo, Z_bar)
df
      X_foo    X_bar      Y_foo    Y_bar     Z_foo     Z_bar
1 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.302423
2 0.7883051 89.24190 0.57263340 95.45036 0.7085305 90.229905
3 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.070528
4 0.8830174 45.66147 0.89982497 69.28034 0.5941420 79.546742
5 0.9404673 95.68333 0.24608773 64.05068 0.2891597  2.461368
6 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.779597

I will be asked to return the top 3 values, ranked, from any one (and only one) of the six variables in the data. The function I wrote to do this is:

aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") {
  # list of names that the function will accept
  good_metric1 <- c("X", "Y", "Z")
  good_metric2 <- c("foo", "bar")
  # use an if statement, so if user enters a bad name they get an error back 
  if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) {
    thePull <- df %>%
      # Select statement should pull exactly one variable (by default, X_foo)
      select(contains(aMetric1)) %>%
      select(contains(aMetric2))
    } else {
      return("Error")
    }
  theOutput <- thePull %>%
    # Create a new variable with the ranks of the variable pulled
    mutate(Rank = min_rank()) %>% # This is where the function breaks
    # Sort the ranks
    arrange(desc(Rank)) %>%
    # Filter for ranks 1,2,3
    filter(Rank <= 3)
  return(theOutput)
}

But when I run aRankingFunction(), it breaks. I've pointed out where the break happens: I can't figure out what the * should be in the statement mutate(Rank = min_rank(*)). That statement will be ranking one of the six variables chosen, but I won't know which one until runtime.

How do I tell the mutate statement, dynamically, "use the variable name that has been chosen"?

Upvotes: 1

Views: 365

Answers (2)

MrFlick
MrFlick

Reputation: 206167

Just focusing on the part that needs work, you need to turn the string you have into a symbol then inject that into the dplyr call with the bang-bang !! operator

...
rankvar <- as.symbol(names(thePull))
theOutput <- thePull %>%
  # Create a new variable with the ranks of the variable pulled
  mutate(Rank = min_rank(!!rankvar)) %>%
...

Another alternative in this special case where you have only one column is

...
theOutput <- thePull %>%
  # Create a new variable with the ranks of the variable pulled
  mutate_all(funs(Rank = min_rank)) %>%
...

Upvotes: 3

zack
zack

Reputation: 5405

You can submit thePull as an argument to min_rank()

aRankingFunction <- function(aMetric1 = "X", aMetric2 = "foo") {
  # list of names that the function will accept
  good_metric1 <- c("X", "Y", "Z")
  good_metric2 <- c("foo", "bar")
  # use an if statement, so if user enters a bad name they get an error back 
  if((aMetric1 %in% good_metric1) & (aMetric2 %in% good_metric2)) {
    thePull <- df %>%
      # Select statement should pull exactly one variable (by default, X_foo)
      select(contains(aMetric1)) %>%
      select(contains(aMetric2))
  } else {
    return("Error")
  }
  theOutput <- df %>%
    # Create a new variable with the ranks of the variable pulled
    mutate(Rank = min_rank(thePull)) %>% # This is where the function breaks
    # Sort the ranks
    arrange(desc(Rank)) %>%
    # Filter for ranks 1,2,3
    filter(Rank <= 3)
  return(theOutput)
}

> aRankingFunction()
      X_foo    X_bar      Y_foo    Y_bar     Z_foo    Z_bar Rank
1 0.4089769 55.14350 0.10292468 88.95393 0.5440660 69.07053    3
2 0.2875775 52.81055 0.67757064 32.79207 0.6557058 96.30242    2
3 0.0455565 45.33342 0.04205953 99.42698 0.1471136 47.77960    1

Upvotes: 1

Related Questions