Byron Pop
Byron Pop

Reputation: 31

How to return rows in one DataFrame that partially match the rows in another DataFrame (string match)

I want to return all the rows in list2 that contain the strings in list1.

list1 <- tibble(name = c("the setosa is pretty", "the versicolor is the best", "the mazda is not a flower"))

list2 <- tibble(name = c("the setosa is pretty and the best flower", "the versicolor is the best and a red flower", "the mazda is a great car"))

For example, the code should return "the setosa is pretty and the best flower" from list2 because it contains the phrase "the setosa is pretty" from list1. I have tried:

grepl(list1$name, list2$name)

but I get the following warning: "Warning message: In grepl(commonPhrasesNPSLessthan6$value, dfNPSLessthan6$nps_comment) : argument 'pattern' has length > 1 and only the first element will be used".

I would appreciate some help! Thank you!

EDIT

list1 <- structure(list(value = c("it would not let me", "to go back and change", 
"i was not able to", "there is no way to", "to pay for a credit"
), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

list2 <- structure(list(comment = c("it would not let me go back and change things", 
"There is no way to back up without starting allover.", "Could not link blah blah account. ", 
"i really just want to speak to someone - and, now that I'm at the very end of the process-", 
"i felt that some of the information that was asked to provide wasn't necessary", 
"i was not able to to go back and make changes")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame")

)

Upvotes: 1

Views: 85

Answers (1)

NelsonGon
NelsonGon

Reputation: 13309

EDIT Based on the new data:

list2 %>% 
  filter(stringr::str_detect(comment,paste0(list1$value,collapse = "|")))
# A tibble: 2 x 1
  comment                                      
  <chr>                                        
1 it would not let me go back and change things
2 i was not able to to go back and make changes

ORIGINAL

A stringr option:

list2[stringr::str_detect(list2$name,list1$name),]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

A base only solution:

list2[lengths(lapply(list1$name,grep,list2$name))>0,]
# A tibble: 2 x 1
  name                                       
  <chr>                                      
1 the setosa is pretty and the best flower   
2 the versicolor is the best and a red flower

Upvotes: 2

Related Questions