Reputation: 359
I have a dataframe of substrings and a list of strings. I want to check which substrings match which element and record the list indices of matches in the dataframe.
my_list <- list("hello there", "how are you?", "I am fine thanks")
words <- data.frame(text = c("he", "she", "they", "you", "I"), index = NA)
The final output should be:
> words
text index
1 he NA
2 she NA
3 they NA
4 you 2
5 I 3
I've tried a loop with grepl, which failed to work both by recording the contents instead of the index, and by not recording the correct element:
for (i in 1:nrow(words)){
x <- grepl(words$text[i], my_list, fixed = T)
if (x == T) {
words$index[i] <- paste(my_list[i])
}
}
> words
text index
1 he hello there
2 she <NA>
3 they <NA>
4 you <NA>
5 I <NA>
I also tried this answer which looked good but which only returned a vector of FALSEs as long as my_list.
EDIT: I'm a bit closer now with this loop, although it's still indexing "he" incorrectly due to the "_he_llo there".
for (i in seq_along(my_list)){
for (j in 1:nrow(words)){
if (grepl(words$text[j], my_list[i], fixed = T) == T){
words$index[[j]] <- i
}
}
}
> words
text index
1 he 1
2 she NA
3 they NA
4 you 2
5 I 3
So, how can I match the element? And then, how can I record the matched element's index?
Thanks!
Upvotes: 1
Views: 82
Reputation: 40131
One solution involving dplyr
, tidyr
, string
and purrr
could be:
map2_dfr(.x = my_list,
.y = 1:length(my_list),
~ set_names(str_detect(.x, paste0("\\b", words$text, "\\b")) * .y, words$text)) %>%
summarise_all(max) %>%
pivot_longer(everything(), names_to = "text", values_to = "index")
text index
<chr> <int>
1 he 0
2 she 0
3 they 0
4 you 2
5 I 3
Or if you want NAs:
map2_dfr(.x = my_list,
.y = 1:length(my_list),
~ set_names(str_detect(.x, paste0("\\b", words$text, "\\b")) * .y, words$text)) %>%
summarise_all(~ if (all(. == 0)) NA else max(.)) %>%
pivot_longer(everything(), names_to = "text", values_to = "index")
text index
<chr> <int>
1 he NA
2 she NA
3 they NA
4 you 2
5 I 3
Upvotes: 1