Using regular expressions in R to detect one or two repeated characters within a class of characters

Question

I would like to detect one or more repeated characters within a class of characters, but not a combination of unique characters within the class. In the example below, we're looking for instances of p's, t's, or k's before an r. All three words satisfy the regular expression below, but I would like to exclude cases like bektri where we have two different consonants before r.

example <- c("betri", "bettri", "bektri")
str_detect(example, "[ptk]r")

So betri and bettri are good, but bektri is bad. Any tips?

paqmo · Accepted Answer

How about this?

library(stringr)
example <- c("betri", "bettri", "bektri")
str_detect(example, "([ptk])(\1+)r|([^ptk])([ptk])r")
#> [1]  TRUE  TRUE FALSE

([ptk])\1{1}r matches p, t, or k two times before an r;
(\1{1} matches one character from the preceding group--([ptk]);
([^ptk])([ptk])r matches a p, t, or k before an r when it is not preceded by a p, t, or k.

You could also generalize to include any consonant that follows that pattern:

library(stringr)
example <- c("betri", "bettri", "bektri", "aepro", "aepo", "aeppro")
str_detect(example, "([[b-df-hj-np-tv-z]])(\1+)r|([^[b-df-hj-np-tv-z]])([[b-df-hj-np-tv-z]])r")
#> [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE

Using regular expressions in R to detect one or two repeated characters within a class of characters

Answers (2)

Related Questions