Eliminate dataframe rows that match a character string

Question

I have a dataframe rawdata with columns that contain ecological information. I am trying to eliminate all of the rows for which the column LatinName matches a vector of species for which I already have some data, and create a new dataframe with only the species that are missing data. So, what I'd like to do is something like:

matches <- c("Thunnus thynnus", "Balaenoptera musculus", "Homarus americanus") 
# obviously these are a random subset; the real vector has ~16,000 values 
rawdata_missing <- rawdata %>% filter(LatinName != "matches")

This doesn't work because the boolean operator can't be applied to a character string. Alternatively I could do something like this:

rawdata_missing <- filter(rawdata, !grepl(matches, LatinName)

This doesn't work either because !grepl also can't use the character string.

I know there are a lot of ways I could subset rawdata using the rows where LatinName IS in matches, but I can't figure out a neat way to subset rawdata such that LatinName is NOT in matches.

Thanks in advance for the help!

shirewoman2 · Accepted Answer

filteredData <- rawdata[!(rawdata$LatinName %in% Matches), ]

Eliminate dataframe rows that match a character string

Answers (2)

Related Questions