stevec
stevec

Reputation: 52328

dplyr / tidy way to filter a vector based on a substring?

We can see some good examples of how to filter a data.frame based on a substring; is there a tidy way of doing this for a vector? (that is, without using grepl() or similar)

Example

I tried what would work on a data.frame

# Leave only words that don't begin with 'cat'

vec <- c("cat", "catamaran", "dog", "mouse", "catacombs")

vec %>% filter(substr(1, 3) != "cat") # %>% ... etc

but

Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "character"

Note

We could use something like vec %>% { .[!grepl("cat", .)] }, or more accurately vec %>% { .[substr(., 1, 3) != "cat"]}, but I will try to find something that..

  1. is more beginner friendly, with more verbally descriptive functions (e.g. a complete novice can probably guess what 'filter' does but possibly not 'grepl')
  2. has less finicky syntax (as few { and } as possible)
  3. pipes more elegantly (e.g. vec %>% filter(...) %>% next operations)
  4. contains as little repetition as possible, noting that the grepl way uses the original vector (denoted by .) twice (as opposed to just once which would be ideal)

Upvotes: 8

Views: 2682

Answers (4)

WaltS
WaltS

Reputation: 5530

Using purrr to work with vectors

library(purrr)
library(stringr)

vec <- c("cat", "catamaran", "dog", "mouse", "catacombs")
vec %>% discard(.p=str_detect, pattern = "^cat")

Upvotes: 5

akrun
akrun

Reputation: 887251

Using tidyverse, we can convert it to tibble, use str_detect within filter and pull the values

library(dplyr)
library(stringr)
tibble(vec) %>%
      filter(!str_detect(vec, "^cat")) %>%
      pull(vec)
#[1] "dog"   "mouse"

Or with magrittr

vec %>%
     str_detect("^cat") %>%
     `!` %>%
     magrittr::extract(vec, .)
#[1] "dog"   "mouse"

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389055

I think tidyverse is more suitable for dataframes/lists and not for vectors. Pipes are needed if you want to perform more than one operation but here you can get the expected result using a single function (grep) without any need for pipes.

grep('^cat', vec, value = TRUE, invert = TRUE)
#[1] "dog"   "mouse"

Or maybe convert the vector to dataframe and then use either of

library(dplyr)
library(tibble)

vec %>% enframe() %>% filter(!startsWith(value, 'cat'))

Or

vec %>% enframe() %>% filter_at(vars(value), any_vars(!startsWith(., 'cat')))

Upvotes: 11

bgaerber
bgaerber

Reputation: 76

If you don't mind using a different package, you can use the stri_detect_fixed function from the stringi package.

install.packages('stringi')
library(stringi)

vec <- c("cat", "catamaran", "dog", "mouse", "catacombs")
vec[stri_detect_fixed(vec, 'cat')]

Output:

[1] "cat"       "catamaran" "catacombs"

You should then be able to pipe this to what ever commands you would like.

Upvotes: 5

Related Questions