Reputation: 36058
How can I use the or
operator while not allowing repetition? In other words the regex:
(word1|word2|word3)+
will match word1word2
but will also match word1word1
which I don't want that because the word word1 is being repeated. How can I avoid repetition?
In summary I will like the following subjects to match:
word1word2word3
word1
word2
word3word2
Note all of them match cause there is no repetition. And I will like the following subjects to fail:
word1word2word1
word2word2
word3word1word2word2
Thanks to @Mark I know have:
(?xi)
(?:
(?<A>word1|word2)(?! .* \k<A> ) # match for word1 or word2 but make sure that if you capture it it does not follow what it was just captured
| (?<B>word3|word4)(?! .* \k<B> )
)+
because I am interested in seeing if something was captured in group A or B.
Upvotes: 13
Views: 39619
Reputation: 33918
The lookahead solutions will not work in several cases, you can solve this properly, without lookarounds, by using a construct like this:
(?:(?(1)(?!))(word1)|(?(2)(?!))(word2)|(?(3)(?!))(word3))+
This works even if some words are substrings of others and will also work if you just want to find the matching substrings of a larger string (and not only match whole string).
It simply works by failing the alteration if it has been matched previously, done by (?(1)(?!))
. (?(1)foo)
is a conditional, and will match foo
if group 1
has previously matched. (?!)
always fails.
Upvotes: 4
Reputation: 13641
You could use a negative look-ahead containing a back reference:
^(?:(word1|word2|word3)(?!.*\1))+$
where \1
refers to the match of the capture group (word1|word2|word3)
.
Note that this assumes word2
cannot be formed by appending characters to word1
, and that word3
cannot be formed by appending characters to word1
or word2
.
Upvotes: 0
Reputation: 31656
Byers' solution is too hard coded and gets quite cumbersome after the letters increases.. Why not simply have the regex look for duplicate match?
([^\d]+\d)+(?=.*\1)
If that matches, that match signifies that a repetition has been found in the pattern. If the match doesn't work you have a valid set of data.
Upvotes: 0
Reputation: 838376
You could use negative lookaheads:
^(?:word1(?!.*word1)|word2(?!.*word2)|word3(?!.*word3))+$
See it working online: rubular
Upvotes: 10