Julien Navarre
Julien Navarre

Reputation: 7840

How to match strings matching [a-z_]* but with non repetitive symbol "_"

I would like to match strings :

So for example the expected matching results would be :

"x"; "x_x" > TRUE
"_x"; "x_"; "_x_"; "x__x" > FALSE

My problems to achieve this is that I can exclude strings ending or starting with "_" but my regexp also excludes length 1 strings.

grepl("^[a-z][a-z_]*[a-z]$", my.string)

My second issue is that I don't know how to negate a match for double characters grepl("(_)\\1", my.string) and how I can integrate it with the 1st part of my regexp.

If possible I would like to do this with perl = FALSE.

Upvotes: 2

Views: 308

Answers (3)

lmo
lmo

Reputation: 38510

Another regex that uses grouping ( and the * for numeration.

myString <- c("x_", "x", "_x", "x_x_x", "x_x", "x__x")

grepl("^([a-z]_)*[a-z]$", myString)
[1] FALSE  TRUE FALSE  TRUE  TRUE FALSE

So ^([a-z]_)* matches 0 or more pairs of "[a-z]_" at the beginning of the string and [a-z]$ assures that the final character is a lower case alphabetical character.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You need to use the following TRE regex:

grepl("^[a-z]+(?:_[a-z]+)*$", my.string)

See the regex demo

Details:

  • ^ - start of string
  • [a-z]+ - one or more ASCII letters
  • (?:_[a-z]+)* - zero or more sequences (*) of
    • _ - an underscore
    • [a-z]+ - one or more ASCII letters
  • $ - end of string.

See R demo:

my.string <- c("x" ,"x_x", "x_x_x_x_x","_x", "x_", "_x_", "x__x")
grepl("^[a-z]+(?:_[a-z]+)*$", my.string)
## => [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Upvotes: 2

IRTFM
IRTFM

Reputation: 263441

This seems to identify the items correctly:

dat <- c("x" ,"x_x","_x", "x_", "_x_", "x__x")
grep("^_|__|_$", dat, invert=TRUE)
[1] 1 2

So try:

!grepl("^_|__|_$", dat)
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE

Just uses negation and a pattern with three conditions separated by the regex logical OR operator "|".

Upvotes: 2

Related Questions