Reputation: 1234
I can't find a way to easily filter a data.frame
on factors, that I thought I could use str_detect
to treat as strings. I want to filter on df$kind
not including flow-delivery
, storage
, or flow-channel
. I could maybe add a column with mutate(kind2 = as.character(kind)
and filter on that, but I'd rather not have the redundancy, and I'm sure I'm missing the obvious.
library(dplyr)
plot_monoth_ts <- function(df, yearmon, rawval, rawunit, dv, study, yrmin, yrmax)
{df %>% filter(str_detect(!kind, 'flow-delivery|storage|flow-channel')) %>%
ggplot(aes(x = yearmon, y = rawval, color = study, linetype = dv))+geom_line()}
which returns this error:
Warning message:
In Ops.factor(kind) : ‘!’ not meaningful for factors
Any tips greatly appreciated.
thank you, Dave
Upvotes: 1
Views: 735
Reputation: 1659
You're over-thinking it! :> No character conversion is necessary. As long as the factor has a label
associated with each of its level
s, you can refer to the levels as if they were strings.
iris %>% head
# Note that 'Species' is a Factor with 3 levels.
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 56 5.7 2.8 4.5 1.3 versicolor
# 44 5.0 3.5 1.6 0.6 setosa
# 104 6.3 2.9 5.6 1.8 virginica
# 123 7.7 2.8 6.7 2.0 virginica
# 149 6.2 3.4 5.4 2.3 virginica
omitted <- c("versicolor", "setosa")
filter(iris, !(Species %in% omitted)) %>% sample_n(5)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 22 5.6 2.8 4.9 2.0 virginica
# 34 6.3 2.8 5.1 1.5 virginica
# 41 6.7 3.1 5.6 2.4 virginica
# 17 6.5 3.0 5.5 1.8 virginica
# 19 7.7 2.6 6.9 2.3 virginica
Note the !(x %in% y)
construct.
Quick comparison of speed:
library(microbenchmark)
microbenchmark(filter(iris, !(Species %in% c("versicolor", "setosa"))))
# Unit: microseconds
# min lq mean median uq max
# 568.189 575.8505 600.3869 580.8085 603.3435 870.7620
microbenchmark(filter(iris, !str_detect(as.character(Species), "versicolor|setosa")))
# Unit: microseconds
# min lq mean median uq max
# 620.169 633.6910 671.0874 656.8275 687.325 928.1510
As expected, converting to character and then using regex pattern-matching is slower, even on a small dataset like iris
.
Upvotes: 2