Alexa Fredston
Alexa Fredston

Reputation: 727

Eliminate dataframe rows that match a character string

I have a dataframe rawdata with columns that contain ecological information. I am trying to eliminate all of the rows for which the column LatinName matches a vector of species for which I already have some data, and create a new dataframe with only the species that are missing data. So, what I'd like to do is something like:

matches <- c("Thunnus thynnus", "Balaenoptera musculus", "Homarus americanus") 
# obviously these are a random subset; the real vector has ~16,000 values 
rawdata_missing <- rawdata %>% filter(LatinName != "matches") 

This doesn't work because the boolean operator can't be applied to a character string. Alternatively I could do something like this:

rawdata_missing <- filter(rawdata, !grepl(matches, LatinName) 

This doesn't work either because !grepl also can't use the character string.

I know there are a lot of ways I could subset rawdata using the rows where LatinName IS in matches, but I can't figure out a neat way to subset rawdata such that LatinName is NOT in matches.

Thanks in advance for the help!

Upvotes: 2

Views: 109

Answers (2)

Gaurav
Gaurav

Reputation: 1587

Another way by using subset, paste, mapply and grepl is...

fileteredData <- subset(rawdata,mapply(grepl,rawdata$LatinName,paste(Matches,collapse = "|")) == FALSE)

Upvotes: 0

shirewoman2
shirewoman2

Reputation: 1928

filteredData <- rawdata[!(rawdata$LatinName %in% Matches), ]

Upvotes: 2

Related Questions