Subsetting a dataframe that contains words from a list in R

Question

I have a database (let's call it "DBA") containing locations for species observations which is aprox. 32k rows long and another database ("DBB") containing aprox. 8.7k names of locations within the area of study. I need to develop a script where I create a subset ("DBC") of DBA consisting of only entries which contain any of the words listed on DBB. It should check each of the 32k rows to look for the words on the first position of DBB, then the second position and so on... It is possible that more than one entry in DBA contain the words "empire state".

That means that if there's a entry on DBB called "empire state", all rows containing these words would be included on DBC. Ideally, this script would consier entries like "somewhere near empire state" or "empire state building". If this is not possible, an exact match of words would suffice.

I know subset() would deliver me exactly what i want, if i had only one location name, such as:

DBA = as_tibble(read.csv("./table.csv"))

DBC = subset(DBA, DBA$locationname == "empire state")

However, I can't make it work on a list, and I have 8.7k names of locations, which I'm not willing to type by hand. I also tried including the select() function on my subset, but I received errors...

I have read answers where such problem was addressed with Python like here, here or here but I'm trying to find a solution using R.

akrun · Accepted Answer

If it is a vector of strings with length > 1, use %in%

vec_of_names <- c("empire state", "empire state building")
subset(DBA, locationname %in% vec_of_names)

Subsetting a dataframe that contains words from a list in R

Answers (1)

Related Questions