Reputation: 2575
Here is my data:
df <- tibble::tribble(
~A, ~B, ~C, ~D,
2L, "a", "e", 2L,
4L, "a", "f", NA_integer_,
4L, "b", "g", NA_integer_,
4L, "b", "h", NA_integer_
)
df$B <- as.factor(df$B)
df$A <- as.factor(as.character(df$A))
Here is my filter condition as a character:
remove2 <- "as.integer(A)!=2L"
I just want remove observations with A==2, but instead the following code keeps it, why?
df %>% dplyr::filter_(remove2)
I want to use filter_ as it accepts the condition as a character. If you can suggest filter (without underscore version) and take character as a condition, that will also work.
Upvotes: 1
Views: 2892
Reputation: 11514
Try the following:
remove2 <- "as.numeric(as.character(A))!=2L"
df %>% dplyr::filter_(remove2)
# A tibble: 3 x 4
A B C D
<fct> <fct> <chr> <int>
1 4 a f NA
2 4 b g NA
3 4 b h NA
Note that factors are encoded differently. See
as.integer(df$A)
[1] 1 2 2 2
To get the values of the factors "as shown", use as.numeric(as.character(.))
Other answers have pointed out that the underscore-functions have deprecated (though they still work). To achieve this in an absolutely future-proof way, it might be a good idea to use simple base
R:
df[which(df[["A"]] != 2L),]
# A tibble: 3 x 4
A B C D
<fct> <fct> <chr> <int>
1 4 a f NA
2 4 b g NA
3 4 b h NA
Upvotes: 3
Reputation: 12819
Code as a string is an anti-pattern. It raises the question: where does the string come from?
If it's you, the developer, typing it, it's both more difficult to write (you don't benefit from your IDE features such as auto-completion), and much more prone to bugs (you can write syntactically invalid code that won't get caught before it's actually parsed and evaluated, possibly much later, raising harder to understand errors).
If it's input from a user that is not you, it's a major security hole.
You could do instead:
remove2 <- quote(as.numeric(as.character(A)) != 2L)
filter(df, !! remove2)
(!!
is the "unquote" operator in the tidyeval framework).
Though it's not completely satisfying either (still a code smell, in my opinion), because it's rare to have to unquote entire pieces of code, usually it's just a variable name.
Upvotes: 3
Reputation: 39154
Others have explained the cause of this issue, which is factor
internally is coded as integer, which could be different than what it looks like apparently. The other thing I want to point out is filter_
have been deprecated since dplyr
0.7. So we can consider evaluate the string as the following two options with the filter
function.
remove2 <- "as.integer(as.character(A)) != 2L"
library(dplyr)
library(rlang)
df %>% filter(eval(parse(text = remove2)))
# # A tibble: 3 x 4
# A B C D
# <fct> <fct> <chr> <int>
# 1 4 a f NA
# 2 4 b g NA
# 3 4 b h NA
df %>% filter(eval(parse_expr(remove2)))
# # A tibble: 3 x 4
# A B C D
# <fct> <fct> <chr> <int>
# 1 4 a f NA
# 2 4 b g NA
# 3 4 b h NA
Upvotes: 3