Geet
Geet

Reputation: 2575

dplyr: filter_ with character condition not working

Here is my data:

df <- tibble::tribble(
  ~A,  ~B,  ~C,  ~D,
  2L, "a", "e", 2L,
  4L, "a", "f", NA_integer_,
  4L, "b", "g", NA_integer_,
  4L, "b", "h", NA_integer_
  )

df$B <- as.factor(df$B) 
df$A <- as.factor(as.character(df$A)) 

Here is my filter condition as a character:

remove2 <- "as.integer(A)!=2L"

I just want remove observations with A==2, but instead the following code keeps it, why?

df %>% dplyr::filter_(remove2)

I want to use filter_ as it accepts the condition as a character. If you can suggest filter (without underscore version) and take character as a condition, that will also work.

Upvotes: 1

Views: 2892

Answers (3)

coffeinjunky
coffeinjunky

Reputation: 11514

Try the following:

remove2 <- "as.numeric(as.character(A))!=2L"

df %>% dplyr::filter_(remove2)

# A tibble: 3 x 4
  A     B     C         D
  <fct> <fct> <chr> <int>
1 4     a     f        NA
2 4     b     g        NA
3 4     b     h        NA

Note that factors are encoded differently. See

 as.integer(df$A)
 [1] 1 2 2 2

To get the values of the factors "as shown", use as.numeric(as.character(.))

Other answers have pointed out that the underscore-functions have deprecated (though they still work). To achieve this in an absolutely future-proof way, it might be a good idea to use simple base R:

df[which(df[["A"]] != 2L),]
# A tibble: 3 x 4
  A     B     C         D
  <fct> <fct> <chr> <int>
1 4     a     f        NA
2 4     b     g        NA
3 4     b     h        NA

Upvotes: 3

Aur&#232;le
Aur&#232;le

Reputation: 12819

Code as a string is an anti-pattern. It raises the question: where does the string come from?

If it's you, the developer, typing it, it's both more difficult to write (you don't benefit from your IDE features such as auto-completion), and much more prone to bugs (you can write syntactically invalid code that won't get caught before it's actually parsed and evaluated, possibly much later, raising harder to understand errors).

If it's input from a user that is not you, it's a major security hole.

You could do instead:

remove2 <- quote(as.numeric(as.character(A)) != 2L)

filter(df, !! remove2)

(!! is the "unquote" operator in the tidyeval framework).

Though it's not completely satisfying either (still a code smell, in my opinion), because it's rare to have to unquote entire pieces of code, usually it's just a variable name.

Upvotes: 3

www
www

Reputation: 39154

Others have explained the cause of this issue, which is factor internally is coded as integer, which could be different than what it looks like apparently. The other thing I want to point out is filter_ have been deprecated since dplyr 0.7. So we can consider evaluate the string as the following two options with the filter function.

remove2 <- "as.integer(as.character(A)) != 2L"

library(dplyr)
library(rlang)

df %>% filter(eval(parse(text = remove2)))
# # A tibble: 3 x 4
#   A     B     C         D
#   <fct> <fct> <chr> <int>
# 1 4     a     f        NA
# 2 4     b     g        NA
# 3 4     b     h        NA

df %>% filter(eval(parse_expr(remove2)))
# # A tibble: 3 x 4
#   A     B     C         D
#   <fct> <fct> <chr> <int>
# 1 4     a     f        NA
# 2 4     b     g        NA
# 3 4     b     h        NA

Upvotes: 3

Related Questions