user7353167
user7353167

Reputation:

Regular Expression in R gives me TRUE for every input

this is my code:

searchvector <- c("good", "wonderful", "bad", "great", "wonder")


> grepl("wonder", searchvector)
[1] FALSE  TRUE FALSE FALSE  TRUE
> grepl(paste0("\\b", "wonder", "\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE  TRUE
> grepl(paste0("\\baudible\\b|\\b|\\bthalia\\b"), searchvector)
[1] TRUE TRUE TRUE TRUE TRUE

I have a large vector with text, where i want to seperate each word to calculate sentiment scores. I only want to match only exact strings, which i managed to do with \\b.

However, some texts matches the whole searchvector as you can see. I was not able to figure out why that is the case. Can anyone explain me what goes wrong here?

Upvotes: 0

Views: 48

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627190

You have a "standalone" \\b alternative that will match if there is a word char in the input.

You need to remove it, and wrap the words within a non-capturing group to only repeat \b once:

grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector) 

R demo:

> searchvector <- c("good", "wonderful", "bad", "great", "wonder")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE
> searchvector <- c("good", "wonderful", "bad", "great", "wonder", "thalia item")
> grepl(paste0("\\b(?:audible|thalia)\\b"), searchvector)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE

Upvotes: 1

Related Questions