SergioH
SergioH

Reputation: 23

Finding matches between each element of a list of strings and each string of a vector in R (avoiding 'for')

I have two columns in a data frame with strings that I want to compare. The first one is a vector of strings and the second one is a list with a mini vector of strings in each element. Imagine to have a data frame like this one:

    V                 L 
"Anameone"     "name" "asd" 
"Bnametwo"         "dfg"
"Cnamethree"   "hey" "C" "hi"

I would like to see if some of the words in the first element of L appears in the first element of V, if some of the words in the second element of L appears in the second element of V... and so on.

I could do what I wanted with a loop like this:

for (i in c(1:3)){ df$matches[i] <- any(df$L[[i]],grepl, df$V[i],ignore.case = T)) }

So that the output is:

> df$matches
[1] "TRUE"  "FALSE" "TRUE"

But actually I have around 100.000 instead of 3 rows and it takes too long indeed. I haven't been able to figure out how to do this a bit more efficiently, any ideas? All my other attempts without using indexs ended up with what would be a matrix 3x3 in this example because it compares "all with all", and I think this could be still worse than a for.

Upvotes: 2

Views: 1085

Answers (3)

Osdorp
Osdorp

Reputation: 320

Something like this?

df <- data.frame(V = c('Anameone','Bnametwo','Cnamethree'),
                 L = I(list(c('name','asd'),c('dfg'),c('hey','C','hi'))))

sapply(1:nrow(df), function(x) any(sapply(df$L[[x]], function(y) grepl(y, df$V[x]))))
# [1]  TRUE FALSE  TRUE

Upvotes: 1

austensen
austensen

Reputation: 3007

You can use purrr::map2_lgl() to iterate over both columns, testing if each element of l is in v with stringr::str_detect(), and then use any() to get just TRUE or FALSE if there are any matches.

library(dplyr)
library(purrr)
library(stringr)

df <- tibble(
  v = c("Anameone", "Bnametwo", "Cnamethree"),
  l = list(c("name", "asd"), "dfg", c("hey", "C", "hi"))
)

mutate(df, matches = map2_lgl(v, l, ~ str_detect(.x, .y) %>% any()))

#> # A tibble: 3 x 3
#>            v         l matches
#>        <chr>    <list>   <lgl>
#> 1   Anameone <chr [2]>    TRUE
#> 2   Bnametwo <chr [1]>   FALSE
#> 3 Cnamethree <chr [3]>    TRUE

Upvotes: 1

Chris
Chris

Reputation: 3986

sapply should work:

df<-data.frame(V=c("Anameone","Bnametwo","Cnamethree"),
           L=I(list(c("name","asd"),"dfg",c("hey","C","hi"))))


sapply(as.character(df$V),function(x)

{grepl(paste(unlist(df$L[1]),collapse="|"),x)})

you'll have to check if it's faster than using the for loop. I couldn't recreate your example.

Upvotes: 0

Related Questions