Lars Kecker
Lars Kecker

Reputation: 90

Regex matching a word containing a character exactly two times in a row

The problem

As stated in the title, my goal is to find a regex that matches a word if and only if it contains a substring of exactly two consecutive characters which is not surrounded by that same character.

Test cases

Things I've tried before

The regex [a-zA-Z]*([a-zA-Z])\1[a-zA-Z]* matches a word with at least two consequtive characters, but belllike would still match because there is no upper limit on consecutive characters.

I also tried to use negative lookaheads and lookbehinds. For one letter, this may look like this:

[a-zA-Z]*(?<!a)aa(?!a)[a-zA-Z]*

This regex fulfills all requirements for the letter a but neither I nor the people I asked could generalize it to using capture groups and thus working for any letter (copy-pasting this statement 26 times - once for each letter - and combining them with OR is not the solution I am looking for, even though it would probably work).

What I'm looking for

A solution for the described problem would be great, of course. If it cannot be done with regex, I would be equally as happy about an explanation on why that is not possible.

Background

This task was part of an assignment I had to do for uni. In a dialogue, the prof later stated that they didn't actually want to ask that question and were fine with accepting character sequences of three or more identical characters. However, the struggle of trying to find a solution for this problem sparked my interest on whether this is actually possible with regex and if so, how it could be done.

Regex flavor to use

Even though the initial task should be done in the Java 8+ regex flavour, I would be fine with a solution in any regex flavor that solves the described problem.

Upvotes: 4

Views: 1392

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

^(.)\1(?!\1)|(.?)(?!\2)(.)\3(?!\3)

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .?                       any character except \n (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \2                       what was matched by capture \2
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of \3
--------------------------------------------------------------------------------
  \3                       what was matched by capture \3
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \3                       what was matched by capture \3
--------------------------------------------------------------------------------
  )                        end of look-ahead

If the regex supports infinite-width lookbehinds:

(.)\1(?!\1)(?<!\1..)

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )                        end of look-behind

Upvotes: 0

JvdV
JvdV

Reputation: 75860

You can try:

^(?:.*?(.)(?!\1))?(.)\2(?!\2).*$

See an demo

  • ^ - Start line anchor.
  • (?: - Open non-capture group:
    • .*? - 0+ Chars other than newline (lazy) upto;
    • (.)(?!\1) - A first capture group of a single char other than newline but assert it's not followed by the same char using a negative lookahead holding a backreference to this char.
    • )? - Close non-capture group and make it optional.
  • (.)\2(?!\2) - The same construct as before with the difference this time there is a backreference between the 2nd capture group and the negative lookahead to assert possition is followed by the exact same char.
  • .* - 0+ Chars other than newline (greedy) upto;
  • $ - End line anchor.

A visualisation of this:

enter image description here

Upvotes: 6

Related Questions