kuba
kuba

Reputation: 1025

Use variable names in functions of dplyr

I want to use variable names as strings in functions of dplyr. See the example below:

df <- data.frame( 
      color = c("blue", "black", "blue", "blue", "black"), 
      value = 1:5)
filter(df, color == "blue")

It works perfectly, but I would like to refer to color by string, something like this:

var <- "color"
filter(df, this_probably_should_be_a_function(var) == "blue").

I would be happy, to do this by any means and super-happy to make use of easy-to-read dplyr syntax.

Upvotes: 48

Views: 43718

Answers (9)

akrun
akrun

Reputation: 886978

In the newer versions, we can create the variables as quoted and then unquote (UQ or !!) for evaluation

var <- quo(color)
filter(df, UQ(var) == "blue")
#   color value
#1  blue     1
#2  blue     3
#3  blue     4

Due to operator precedence, we may require () to wrap around !!

filter(df, (!!var) == "blue")
#   color value
#1  blue     1
#2  blue     3
#3  blue     4

With new version, || have higher precedence, so

filter(df, !! var == "blue")

should work (as @Moody_Mudskipper commented)

Older option

We may also use:

 filter(df, get(var, envir=as.environment(df))=="blue")
 #color value
 #1  blue     1
 #2  blue     3
 #3  blue     4

EDIT: Rearranged the order of solutions

Upvotes: 37

Ben Bolker
Ben Bolker

Reputation: 226097

new with rlang version >= 0.4.0

.data is now recognized as a way to refer to the parent data frame, so reference by string works as follows:

var <- "color"
filter(df, .data[[var]] == "blue")

If the variable is already a symbol, then {{}} will dereference it properly

example 1:

var <- quo(color)
filter(df, {{var}} == "blue")

or more realistically

f <- function(v) {
    filter(df, {{v}} == "blue")
}
f(color) # Curly-curly provides automatic NSE support

More reading and examples are provided in the Programming with dplyr article/vignette.

Upvotes: 16

Geoffrey Poole
Geoffrey Poole

Reputation: 1268

This question was posted 6 years ago. dplyr is now up to version 1.0.2. Yet this is still a great discussion and helped me immensely with with my problem. I wanted to be able to construct filters from columns, operators, and values that are all specified by variables in memory. Oh, and for an indeterminate number of filters!

Consider the following list where I specify the column, the operator, and the value for two filters:

myFilters = 
  list(
    list(var = "color", op = "%in%", val = "blue"),
    list(var = "value", op = "<=", val = 3)
  )

From this list, I want to run:

dplyr::filter(color %in% "blue", value <= 3)

We can use lapply on the list above to create a list of call objects, force evaluation of the calls using the !!! operator, and pass that to filter:

library(dplyr)

df <- data.frame( 
  color = c("blue", "black", "blue", "blue", "black"), 
  value = 1:5)

result = 
  lapply(myFilters, function(x) call(x$op, as.name(x$var), x$val)) %>%
  {filter(df, !!!.)}

...and Shazam!

> result
  color value
1  blue     1
2  blue     3

That's a lot to absorb, so if it isn't immediately apparent what's happening, let me unpack it a bit. Consider:

var = "color"
op = "%in%"
val = "blue"

I'd want to be able to run:

filter(df, color %in% "blue")

and if I also have:

var2 = "value"
op2 = "<="
val2 = 3

I might want to be able to get:

filter(df, color %in% "blue", value <= 3)

The solution uses calls, which are unevaluated expressions. (See Hadley's Advanced R book) Basically, make a list of call object from variables, and then force evaluation of the calls using the !!! operator when calling dplyr::filter.

call1 = call(op, as.name(var), val)

Here is the value of call1:

> call1
color %in% "blue"

Let's create another call:

call2 = call(op2, as.name(var2), val2)

Put them in list:

calls = list(call1, call2)

and use !!! to evaluate the list of calls prior to sending them to filter:

result = filter(df, !!!calls)

Upvotes: 3

llewmills
llewmills

Reputation: 3568

An update. The new dplyr1.0.0 has some fantastic new functionality that makes solving these sorts of problems far easier. You can read about it in the 'programming' vignette accompanying the new package.

Basically the .data[[foo]] function allows you to pass strings into functions more easily.

So you can do this

filtFunct <- function(d, var, crit) {
filter(d, .data[[var]] %in% crit)
}

filtFunct(df, "value", c(2,4))

#   color value
# 1 black     2
# 2  blue     4

filtFunct(df, "color", "blue")

#   color value
# 1  blue     1
# 2  blue     3
# 3  blue     4

Upvotes: 5

llewmills
llewmills

Reputation: 3568

Several of the solutions above did not work for me. Now there is the as.symbol function, which we wrap in !!. Seems a bit simpler, sort of.

set.seed(123)

df <- data.frame( 
  color = c("blue", "black", "blue", "blue", "black"), 
  shape = c("round", "round", "square", "round", "square"),
  value = 1:5)

Now enter the variable as a string into the dplyr functions by passing it through as.symbol() and !!

var <- "color"
filter(df, !!as.symbol(var) == "blue")

#   color  shape value
# 1  blue  round     1
# 2  blue square     3
# 3  blue  round     4

var <- "shape"
df %>% group_by(!!as.symbol(var)) %>% summarise(m = mean(value))

#   shape      m
#   <fct>  <dbl>
# 1 round   2.33
# 2 square  4

Upvotes: 7

Mark Heckmann
Mark Heckmann

Reputation: 11431

For dplyr versions [0.3 - 0.7) (? - June 2017)

(For more recent dplyr versions, please see other answers to this question)

As of dplyr 0.3 every dplyr function using non standard evaluation (NSE, see release post and vignette) has a standard evaluation (SE) twin ending in an underscore. These can be used for passing variables. For filter it will be filter_. Using filter_ you may pass the logical condition as a string.

filter_(df, "color=='blue'")

#   color value
# 1  blue     1
# 2  blue     3
# 3  blue     4

Construing the string with the logical condition is of course straighforward

l <- paste(var, "==",  "'blue'")
filter_(df, l)

Upvotes: 27

takje
takje

Reputation: 2800

As of dplyr 0.7, some things have changed again.

library(dplyr)
df <- data.frame( 
  color = c("blue", "black", "blue", "blue", "black"), 
  value = 1:5)
filter(df, color == "blue")

# it was already possible to use a variable for the value
val <- 'blue'
filter(df, color == val)

# As of dplyr 0.7, new functions were introduced to simplify the situation
col_name <- quo(color) # captures the current environment
df %>% filter((!!col_name) == val)

# Remember to use enquo within a function
filter_col <- function(df, col_name, val){
  col_name <- enquo(col_name) # captures the environment in which the function was called
  df %>% filter((!!col_name) == val)
}
filter_col(df, color, 'blue')

More general cases are explained in the dplyr programming vignette.

Upvotes: 17

Tom Roth
Tom Roth

Reputation: 2074

Here is one way to do it using the sym() function in the rlang package:

library(dplyr)

df <- data.frame( 
  main_color = c("blue", "black", "blue", "blue", "black"), 
  secondary_color = c("red", "green", "black", "black", "red"),
  value = 1:5, 
  stringsAsFactors=FALSE
)

filter_with_quoted_text <- function(column_string, value) {
    col_name <- rlang::sym(column_string)
    df1 <- df %>% 
      filter(UQ(col_name) == UQ(value))
    df1
}

filter_with_quoted_text("main_color", "blue")
filter_with_quoted_text("secondary_color", "red")

Upvotes: 5

lukeA
lukeA

Reputation: 54237

Often asked, but still no easy support afaik. However, with regards to this posting:

eval(substitute(filter(df, var == "blue"), 
                list(var = as.name(var))))
#   color value
# 1  blue     1
# 2  blue     3
# 3  blue     4

Upvotes: 7

Related Questions