user9974638
user9974638

Reputation: 199

Using regular expressions in R to detect one or two repeated characters within a class of characters

I would like to detect one or more repeated characters within a class of characters, but not a combination of unique characters within the class. In the example below, we're looking for instances of p's, t's, or k's before an r. All three words satisfy the regular expression below, but I would like to exclude cases like bektri where we have two different consonants before r.

example <- c("betri", "bettri", "bektri")
str_detect(example, "[ptk]r")

So betri and bettri are good, but bektri is bad. Any tips?

Upvotes: 2

Views: 715

Answers (2)

Junitar
Junitar

Reputation: 999

You can use a negative lookbehind (?<!) to exclude matches when your letter combinations are preceded by k.

example <- c("betri", "bettri", "bektri")
str_detect(example, "(?<!k)[ptk]r")
[1]  TRUE  TRUE FALSE

Edit:
I notice that I misread your post and you need to exclude matches when you have two different consonants before r.

Then I would use the following regex: (?<![^aeuioy])([^aeuioy])\\1?r. It will match any single or duplicate consonants before r, whether it's at the beginning of the word or in the middle of it.

Upvotes: 1

paqmo
paqmo

Reputation: 3729

How about this?

library(stringr)
example <- c("betri", "bettri", "bektri")
str_detect(example, "([ptk])(\\1+)r|([^ptk])([ptk])r")
#> [1]  TRUE  TRUE FALSE

([ptk])\\1{1}r matches p, t, or k two times before an r;
(\\1{1} matches one character from the preceding group--([ptk]);
([^ptk])([ptk])r matches a p, t, or k before an r when it is not preceded by a p, t, or k.

You could also generalize to include any consonant that follows that pattern:

library(stringr)
example <- c("betri", "bettri", "bektri", "aepro", "aepo", "aeppro")
str_detect(example, "([[b-df-hj-np-tv-z]])(\\1+)r|([^[b-df-hj-np-tv-z]])([[b-df-hj-np-tv-z]])r")
#> [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE

Upvotes: 1

Related Questions