Reputation: 14364
I have a variable with the same name as a column in a dataframe:
df <- data.frame(a=c(1,2,3), b=c(4,5,6))
b <- 5
I want to get the rows where df$b == b
, but dplyr interprets this as df$b == df$b
:
df %>% filter(b == b) # interpreted as df$b == df$b
# a b
# 1 1 4
# 2 2 5
# 3 3 6
If I change the variable name, it works:
B <- 5
df %>% filter(b == B) # interpreted as df$b == B
# a b
# 1 2 5
I'm wondering if there is a better way to tell filter
that b
refers to an outside variable.
Upvotes: 38
Views: 20978
Reputation: 3
And for those who are interested on how to use column as a variable I find this solution as the most quickest and understandable:
df %>% filter(!!as.name(column_name) == !!b)
Upvotes: 0
Reputation: 18612
rlang
, which is imported with dplyr
, has the .env
and .data
pronouns for exactly this situation when you need to be explicit because of data-masking. To explicitly reference columns in your data frame use .data
and to explicitly reference your environment use .env
:
library(dplyr)
df %>%
filter(.data$b == .env$b) # b == .env$b works the same here
a b
1 2 5
From the documentation:
Note that .data is only a pronoun, it is not a real data frame. This means that you can't take its names or map a function over the contents of .data. Similarly, .env is not an actual R environment.
You do not necessarily need to use .data$b
here because the evaluation searches the data frame for a column with that name first (as you found out).
Upvotes: 10
Reputation: 869
Recently I have found this to be an elegant solution to this problem, although I'm just starting to wrap my head around how it works.
df %>% filter(b == !!b)
which is syntactic sugar for
df %>% filter(b == UQ(b))
A high-level sense of this is that the UQ
(un-quote) operation causes its contents to be evaluated before the filter operation, so that it's not evaluated within the data.frame.
This is described in this chapter of Advanced R, on 'quasi-quotation'. This chapter also includes a few solutions to similar problems related to non-standard evaluation (NSE).
Upvotes: 56
Reputation: 1721
You could use the get
function to fetch the value of the variable from the environment.
df %>% filter(b == get("b")) # Note the "" around b
Upvotes: 20
Reputation: 35187
As a general solution, you can use the SE (standard evaluation) version of filter
, which is filter_
. In this case, things get a bit confusing because your are mixing a variable and an 'external' constant in a single expression. Here is how you do that with the interp
function:
library(lazyeval)
df %>% filter_(interp(~ b == x, x = b))
If you would like to use more values in b
you can write:
df %>% filter_(interp(~ b == x, .values = list(x = b)))
Upvotes: 8