Reputation: 90
As stated in the title, my goal is to find a regex that matches a word if and only if it contains a substring of exactly two consecutive characters which is not surrounded by that same character.
Helo
--> false
programming
--> true
belllike
--> false
(since there are three l
s)shellless
--> true
(even though there are three l
s, this input should match because of the two s
sThe regex [a-zA-Z]*([a-zA-Z])\1[a-zA-Z]*
matches a word with at least two consequtive characters, but belllike
would still match because there is no upper limit on consecutive characters.
I also tried to use negative lookaheads and lookbehinds. For one letter, this may look like this:
[a-zA-Z]*(?<!a)aa(?!a)[a-zA-Z]*
This regex fulfills all requirements for the letter a
but neither I nor the people I asked could generalize it to using capture groups and thus working for any letter (copy-pasting this statement 26 times - once for each letter - and combining them with OR is not the solution I am looking for, even though it would probably work).
A solution for the described problem would be great, of course. If it cannot be done with regex, I would be equally as happy about an explanation on why that is not possible.
This task was part of an assignment I had to do for uni. In a dialogue, the prof later stated that they didn't actually want to ask that question and were fine with accepting character sequences of three or more identical characters. However, the struggle of trying to find a solution for this problem sparked my interest on whether this is actually possible with regex and if so, how it could be done.
Even though the initial task should be done in the Java 8+ regex flavour, I would be fine with a solution in any regex flavor that solves the described problem.
Upvotes: 4
Views: 1392
Reputation: 18611
Use
^(.)\1(?!\1)|(.?)(?!\2)(.)\3(?!\3)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.? any character except \n (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\2 what was matched by capture \2
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\3 what was matched by capture \3
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\3 what was matched by capture \3
--------------------------------------------------------------------------------
) end of look-ahead
If the regex supports infinite-width lookbehinds:
(.)\1(?!\1)(?<!\1..)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
) end of look-behind
Upvotes: 0
Reputation: 75860
You can try:
^(?:.*?(.)(?!\1))?(.)\2(?!\2).*$
See an demo
^
- Start line anchor.(?:
- Open non-capture group:
.*?
- 0+ Chars other than newline (lazy) upto;(.)(?!\1)
- A first capture group of a single char other than newline but assert it's not followed by the same char using a negative lookahead holding a backreference to this char.)?
- Close non-capture group and make it optional.(.)\2(?!\2)
- The same construct as before with the difference this time there is a backreference between the 2nd capture group and the negative lookahead to assert possition is followed by the exact same char..*
- 0+ Chars other than newline (greedy) upto;$
- End line anchor.A visualisation of this:
Upvotes: 6