Reputation: 1015
I have a string in R:
c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")
I want to keep all matches that have FLT1 but not when other alphanumeric characters are added. In other words, I want to keep all entries except the second one, as all of them mention FLT1, but the second one mentions FLT1P1.
When I use str_detect, it returns everything as true:
str_detect(string, "FLT1")
[1] TRUE TRUE TRUE TRUE
Can anyone advise on the best method to only return the items that mention FLT1?
Upvotes: 2
Views: 9795
Reputation: 21400
The best way is by using \\b
, as noted by others.
Alternatively you can use positive lookahead:
Data:
x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")
Solution:
grep("FLT1(?=$|-|,)", x, perl = T, value = T)
[1] "FLT1" "FLT1-FLT2" "SGY-FLT1, GPD"
Here, grep
matches FLT1
if, and only if, the immediately next thing is either the end of the string ($
) or -
or ,
. By implication, it does not match when the immediately next char is, for example, alphanumeric.
Or, if the rule is that you want to exclude values where alphanumeric characters are added, you can use negative lookahead:
grep("FLT1(?!\\w)", x, perl = T, value = T)
[1] "FLT1" "FLT1-FLT2" "SGY-FLT1, GPD"
Upvotes: 2
Reputation: 145755
Probably word boundaries with \\b
will work. They match the beginning or end of strings and the transition to/from any character that is not a number, letter, or underscore.
str_detect(string, "\\bFLT1\\b")
[1] TRUE FALSE TRUE TRUE
Upvotes: 8
Reputation: 72623
"No other characters added" means to me word boundary which is expressed by \\b
.
x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD")
stringr::str_detect(x, "FLT1\\b")
# [1] TRUE FALSE TRUE TRUE
Or base R:
grepl("FLT1\\b", x)
# [1] TRUE FALSE TRUE TRUE
Upvotes: 2
Reputation: 4151
Use look arounds
library(stringr)
x <- c("FLT1", "FLT1P1", "FLT1-FLT2", "SGY-FLT1, GPD","AFLT1")
x %>%
str_detect("(?<![:alpha:])FLT1(?![:alpha:])")
#> [1] TRUE FALSE TRUE TRUE FALSE
Created on 2020-06-17 by the reprex package (v0.3.0)
Upvotes: 1