Reputation: 10491
This seems like a simple question, but I have not come across a clean solution for it yet. I have a vector in R and I want to remove certain elements from the vector, however I want to avoid the vector[vector != "thiselement"] notation for a variety of reasons. In particular, here is what I am trying to do:
# this doesnt work
all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska")
# this doesnt work either
all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska")
# this does work but i want to avoid this approach to filtering
all_states = gsub(" ", "-", tolower(state.name))
all_states = all_states[all_states != "alaska"]
can this be done in a simple manner? Thanks in advance for the help!
EDIT - the reason I'm struggling with this is because I'm only finding things online regarding filtering based on a column of a dataframe, for example:
my_df %>% filter(col != "alaska")
however I'm working with a vector not a dataframe here
Upvotes: 25
Views: 23570
Reputation: 31
An easy way to the desired result within the tidyverse is to put the vector into a tibble and then pull out the vector.
tibble(myvec = gsub(" ", "-", tolower(state.name))) %>%
filter(myvec != "alaska") %>% pull(myvec)
With the desired Output: [1] "alabama" "arizona" "arkansas" "california" "colorado" ...
Upvotes: 0
Reputation: 1082
Update
As @r_31415 noted in the comments, packages such as stringr
provide functions that can better address this question.
With str_subset(string, pattern, negate=FALSE)
, one could filter character vectors like
library(stringr)
# Strings that have at least one character that is neither "A" nor "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[^AB]")
[1] "ab" "CA"
# Strings that do not include characters "A" or "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[AB]", negate=TRUE)
[1] "ab"
By default, the pattern
is interpreted as a regular expression. Therefore, to search literal patterns that contains special characters like (
, *
, and ?
, one could enclose the pattern string with the modifier function fixed(literal_string)
instead of escaping with double-backslash escape or the raw-string since R 4.0.0
# escape special character with "\\" (has to escape `\` with itself in a string literal).
> c("(123.5)", "12345") %>% str_subset("\\(123\\.5\\)")
[1] "(123.5)"
# R 4.0.0 supports raw-string, which is handy for regex strings
> c("(123.5)", "12345") %>% str_subset(r"{\(123\.5\)}")
[1] "(123.5)"
# use the fixed() modifier
> c("(123.5)", "12345") %>% str_subset(fixed("(123.5)"))
[1] "(123.5)"
## unexpected results if without escaping or the "fixed()" modifier
> c("(123.5)", "12345") %>% str_subset("(123.5)")
[1] "(123.5)" "12345"
Original Answer
Sorry for posting on a 5-month-old question to archive a simpler solution.
Package dplyr
can filter character vectors in following ways:
> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"
The first approach allows you to filter with regular expression, and the second approach uses fewer words. It works because package dplyr
imports package magrittr
albeit masks its functions like extract
, but not the placeholder .
.
Details of placeholder .
can be found on within help of forward-pipe operator %>%
, and this placeholder has mainly three usage:
- Using the dot for secondary purposes
- Using lambda expressions with %>%
- Using the dot-place holder as lhs
Here we are taking advantage of its 3rd usage.
Upvotes: 45
Reputation: 1870
You may like to try magrittr::extract
. e.g.
> library(magrittr)
> c("A", "B", "C", "D") %>% extract(.!="A")
[1] "B" "C" "D"
For more extract
-like functions load magrittr
package and type ?alises
.
Upvotes: 22
Reputation: 492
Pretty sure dplyr only really operates on data.frames. Here's a two line example coercing the vector to a data.frame and back.
myDf = data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska")
all_states = myDf$states
or a gross one liner:
all_states = (data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska"))$states
Upvotes: 4