Reputation: 727
I have a dataframe rawdata
with columns that contain ecological information. I am trying to eliminate all of the rows for which the column LatinName
matches a vector of species for which I already have some data, and create a new dataframe with only the species that are missing data. So, what I'd like to do is something like:
matches <- c("Thunnus thynnus", "Balaenoptera musculus", "Homarus americanus")
# obviously these are a random subset; the real vector has ~16,000 values
rawdata_missing <- rawdata %>% filter(LatinName != "matches")
This doesn't work because the boolean operator can't be applied to a character string. Alternatively I could do something like this:
rawdata_missing <- filter(rawdata, !grepl(matches, LatinName)
This doesn't work either because !grepl
also can't use the character string.
I know there are a lot of ways I could subset rawdata
using the rows where LatinName
IS in matches
, but I can't figure out a neat way to subset rawdata
such that LatinName
is NOT in matches
.
Thanks in advance for the help!
Upvotes: 2
Views: 109
Reputation: 1587
Another way by using subset, paste, mapply and grepl is...
fileteredData <- subset(rawdata,mapply(grepl,rawdata$LatinName,paste(Matches,collapse = "|")) == FALSE)
Upvotes: 0