icedcoffee
icedcoffee

Reputation: 1015

Detecting whole words using str_detect() in R

I have a string in R:

c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")

I want to keep all matches that have FLT1 but not when other alphanumeric characters are added. In other words, I want to keep all entries except the second one, as all of them mention FLT1, but the second one mentions FLT1P1.

When I use str_detect, it returns everything as true:

str_detect(string, "FLT1")
[1] TRUE TRUE TRUE TRUE

Can anyone advise on the best method to only return the items that mention FLT1?

Upvotes: 2

Views: 9795

Answers (4)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

The best way is by using \\b, as noted by others. Alternatively you can use positive lookahead:

Data:

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")

Solution:

grep("FLT1(?=$|-|,)", x, perl = T, value = T)
[1] "FLT1"          "FLT1-FLT2"     "SGY-FLT1, GPD"

Here, grepmatches FLT1 if, and only if, the immediately next thing is either the end of the string ($) or - or ,. By implication, it does not match when the immediately next char is, for example, alphanumeric.

Or, if the rule is that you want to exclude values where alphanumeric characters are added, you can use negative lookahead:

grep("FLT1(?!\\w)", x, perl = T, value = T)
[1] "FLT1"          "FLT1-FLT2"     "SGY-FLT1, GPD"

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 145755

Probably word boundaries with \\b will work. They match the beginning or end of strings and the transition to/from any character that is not a number, letter, or underscore.

str_detect(string, "\\bFLT1\\b")
[1]  TRUE FALSE  TRUE  TRUE

Upvotes: 8

jay.sf
jay.sf

Reputation: 72623

"No other characters added" means to me word boundary which is expressed by \\b.

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")
stringr::str_detect(x, "FLT1\\b")
# [1]  TRUE FALSE  TRUE  TRUE

Or base R:

grepl("FLT1\\b", x)
# [1]  TRUE FALSE  TRUE  TRUE

Upvotes: 2

Bruno
Bruno

Reputation: 4151

Use look arounds

library(stringr)

x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD","AFLT1")

x %>% 
  str_detect("(?<![:alpha:])FLT1(?![:alpha:])")
#> [1]  TRUE FALSE  TRUE  TRUE FALSE

Created on 2020-06-17 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions