Canovice
Canovice

Reputation: 10491

dplyr filter on a vector rather than a dataframe in R

This seems like a simple question, but I have not come across a clean solution for it yet. I have a vector in R and I want to remove certain elements from the vector, however I want to avoid the vector[vector != "thiselement"] notation for a variety of reasons. In particular, here is what I am trying to do:

# this doesnt work
all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska")

# this doesnt work either
all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska")

# this does work but i want to avoid this approach to filtering
all_states = gsub(" ", "-", tolower(state.name))
all_states = all_states[all_states != "alaska"]

can this be done in a simple manner? Thanks in advance for the help!

EDIT - the reason I'm struggling with this is because I'm only finding things online regarding filtering based on a column of a dataframe, for example:

my_df %>% filter(col != "alaska")

however I'm working with a vector not a dataframe here

Upvotes: 25

Views: 23570

Answers (4)

user2683720
user2683720

Reputation: 31

An easy way to the desired result within the tidyverse is to put the vector into a tibble and then pull out the vector.

tibble(myvec = gsub(" ", "-", tolower(state.name))) %>% 
   filter(myvec != "alaska") %>% pull(myvec)

With the desired Output: [1] "alabama" "arizona" "arkansas" "california" "colorado" ...

Upvotes: 0

Quar
Quar

Reputation: 1082

Update

As @r_31415 noted in the comments, packages such as stringr provide functions that can better address this question.

With str_subset(string, pattern, negate=FALSE), one could filter character vectors like

library(stringr)

# Strings that have at least one character that is neither "A" nor "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[^AB]")
[1] "ab" "CA"


# Strings that do not include characters "A" or "B".
> c("AB", "BA", "ab", "CA") %>% str_subset("[AB]", negate=TRUE)
[1] "ab"

By default, the pattern is interpreted as a regular expression. Therefore, to search literal patterns that contains special characters like (, *, and ?, one could enclose the pattern string with the modifier function fixed(literal_string) instead of escaping with double-backslash escape or the raw-string since R 4.0.0

# escape special character with "\\" (has to escape `\` with itself in a string literal).
> c("(123.5)", "12345") %>% str_subset("\\(123\\.5\\)")
[1] "(123.5)"

# R 4.0.0 supports raw-string, which is handy for regex strings
> c("(123.5)", "12345") %>% str_subset(r"{\(123\.5\)}")
[1] "(123.5)"

# use the fixed() modifier
> c("(123.5)", "12345") %>% str_subset(fixed("(123.5)"))
[1] "(123.5)"


## unexpected results if without escaping or the "fixed()" modifier
> c("(123.5)", "12345") %>% str_subset("(123.5)")
[1] "(123.5)" "12345"

Original Answer

Sorry for posting on a 5-month-old question to archive a simpler solution.

Package dplyr can filter character vectors in following ways:

> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"

The first approach allows you to filter with regular expression, and the second approach uses fewer words. It works because package dplyr imports package magrittr albeit masks its functions like extract, but not the placeholder ..

Details of placeholder . can be found on within help of forward-pipe operator %>%, and this placeholder has mainly three usage:

  • Using the dot for secondary purposes
  • Using lambda expressions with %>%
  • Using the dot-place holder as lhs

Here we are taking advantage of its 3rd usage.

Upvotes: 45

Łukasz Deryło
Łukasz Deryło

Reputation: 1870

You may like to try magrittr::extract. e.g.

> library(magrittr)

> c("A", "B", "C", "D") %>% extract(.!="A")
[1] "B" "C" "D"

For more extract-like functions load magrittr package and type ?alises.

Upvotes: 22

David Pedack
David Pedack

Reputation: 492

Pretty sure dplyr only really operates on data.frames. Here's a two line example coercing the vector to a data.frame and back.

myDf = data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska")
all_states = myDf$states

or a gross one liner:

all_states = (data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska"))$states

Upvotes: 4

Related Questions