Reputation:
this is my code:
searchvector <- c("good", "wonderful", "bad", "great", "wonder")
> grepl("wonder", searchvector)
[1] FALSE TRUE FALSE FALSE TRUE
> grepl(paste0("\\b", "wonder", "\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE TRUE
> grepl(paste0("\\baudible\\b|\\b|\\bthalia\\b"), searchvector)
[1] TRUE TRUE TRUE TRUE TRUE
I have a large vector with text, where i want to seperate each word to calculate sentiment scores. I only want to match only exact strings, which i managed to do with \\b
.
However, some texts matches the whole searchvector as you can see. I was not able to figure out why that is the case. Can anyone explain me what goes wrong here?
Upvotes: 0
Views: 48
Reputation: 627190
You have a "standalone" \\b
alternative that will match if there is a word char in the input.
You need to remove it, and wrap the words within a non-capturing group to only repeat \b
once:
grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
R demo:
> searchvector <- c("good", "wonderful", "bad", "great", "wonder")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE
> searchvector <- c("good", "wonderful", "bad", "great", "wonder", "thalia item")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE TRUE
Upvotes: 1