nachocab
nachocab

Reputation: 14364

How to use a variable in dplyr::filter?

I have a variable with the same name as a column in a dataframe:

df <- data.frame(a=c(1,2,3), b=c(4,5,6))
b <- 5

I want to get the rows where df$b == b, but dplyr interprets this as df$b == df$b:

df %>% filter(b == b) # interpreted as df$b == df$b
#   a b
# 1 1 4
# 2 2 5
# 3 3 6

If I change the variable name, it works:

B <- 5
df %>% filter(b == B) # interpreted as df$b == B
#   a b
# 1 2 5

I'm wondering if there is a better way to tell filter that b refers to an outside variable.

Upvotes: 38

Views: 20978

Answers (5)

Evgeniy
Evgeniy

Reputation: 3

And for those who are interested on how to use column as a variable I find this solution as the most quickest and understandable:

df %>% filter(!!as.name(column_name) == !!b)

Upvotes: 0

LMc
LMc

Reputation: 18612

rlang, which is imported with dplyr, has the .env and .data pronouns for exactly this situation when you need to be explicit because of data-masking. To explicitly reference columns in your data frame use .data and to explicitly reference your environment use .env:

library(dplyr)
df %>% 
  filter(.data$b == .env$b) # b == .env$b works the same here

  a b
1 2 5

From the documentation:

Note that .data is only a pronoun, it is not a real data frame. This means that you can't take its names or map a function over the contents of .data. Similarly, .env is not an actual R environment.

You do not necessarily need to use .data$b here because the evaluation searches the data frame for a column with that name first (as you found out).

Upvotes: 10

jackinovik
jackinovik

Reputation: 869

Recently I have found this to be an elegant solution to this problem, although I'm just starting to wrap my head around how it works.

df %>% filter(b == !!b)

which is syntactic sugar for

df %>% filter(b == UQ(b))

A high-level sense of this is that the UQ (un-quote) operation causes its contents to be evaluated before the filter operation, so that it's not evaluated within the data.frame.

This is described in this chapter of Advanced R, on 'quasi-quotation'. This chapter also includes a few solutions to similar problems related to non-standard evaluation (NSE).

Upvotes: 56

nist
nist

Reputation: 1721

You could use the get function to fetch the value of the variable from the environment.

df %>% filter(b == get("b")) # Note the "" around b

Upvotes: 20

Axeman
Axeman

Reputation: 35187

As a general solution, you can use the SE (standard evaluation) version of filter, which is filter_. In this case, things get a bit confusing because your are mixing a variable and an 'external' constant in a single expression. Here is how you do that with the interp function:

library(lazyeval)
df %>% filter_(interp(~ b == x, x = b))

If you would like to use more values in b you can write:

df %>% filter_(interp(~ b == x, .values = list(x = b)))

Upvotes: 8

Related Questions