How to match distinct repeated characters

Question

I'm trying to come up with a regex in R to match strings in which there is repetition of two distinct characters.

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")

This regex matches all of the above, including strings such as "mmmm" and "ohhhh" where the repeated letter is the same in the first and the second repetition:

grep(".*([a-z])\1.*([a-z])\2", x, value = T)

What I'd like to match in x are these strings where the repeated letters are distinct:

"cooee","helloee","oooaaah","sshh","vroomm","whoopee","yippee"

How can the regex be tweaked to make sure the second repeated character is not the same as the first?

Wiktor Stribiżew · Accepted Answer

You may restrict the second char pattern with a negative lookahead:

grep(".*([a-z])\1.*(?!\1)([a-z])\2", x, value=TRUE, perl=TRUE)
#                    ^^^^^

See the regex demo.

(?!\1)([a-z]) means match and capture into Group 2 any lowercase ASCII letter if it is not the same as the value in Group 1.

R demo:

x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")
grep(".*([a-z])\1.*(?!\1)([a-z])\2", x, value=TRUE, perl=TRUE)
# => "cooee"   "helloee" "oooaaah" "sshh"    "vroomm"  "whoopee" "yippee"

How to match distinct repeated characters

Answers (2)

Related Questions