Tono Nam
Tono Nam

Reputation: 36058

Regex 'or' operator avoid repetition

How can I use the or operator while not allowing repetition? In other words the regex:

(word1|word2|word3)+

will match word1word2 but will also match word1word1 which I don't want that because the word word1 is being repeated. How can I avoid repetition?

In summary I will like the following subjects to match:

word1word2word3
word1
word2
word3word2

Note all of them match cause there is no repetition. And I will like the following subjects to fail:

word1word2word1
word2word2
word3word1word2word2

Edit

Thanks to @Mark I know have:

(?xi)

(?:  
        (?<A>word1|word2)(?!  .*  \k<A> )      # match for word1 or word2 but make sure that if you capture it it does not follow what it was just captured
    |   (?<B>word3|word4)(?!  .*  \k<B> )
)+

because I am interested in seeing if something was captured in group A or B.

Upvotes: 13

Views: 39619

Answers (4)

Qtax
Qtax

Reputation: 33918

The lookahead solutions will not work in several cases, you can solve this properly, without lookarounds, by using a construct like this:

(?:(?(1)(?!))(word1)|(?(2)(?!))(word2)|(?(3)(?!))(word3))+

This works even if some words are substrings of others and will also work if you just want to find the matching substrings of a larger string (and not only match whole string).

Live demo.

It simply works by failing the alteration if it has been matched previously, done by (?(1)(?!)). (?(1)foo) is a conditional, and will match foo if group 1 has previously matched. (?!) always fails.

Upvotes: 4

MikeM
MikeM

Reputation: 13641

You could use a negative look-ahead containing a back reference:

^(?:(word1|word2|word3)(?!.*\1))+$

where \1 refers to the match of the capture group (word1|word2|word3).

Note that this assumes word2 cannot be formed by appending characters to word1, and that word3 cannot be formed by appending characters to word1 or word2.

Upvotes: 0

ΩmegaMan
ΩmegaMan

Reputation: 31656

Byers' solution is too hard coded and gets quite cumbersome after the letters increases.. Why not simply have the regex look for duplicate match?

([^\d]+\d)+(?=.*\1)

If that matches, that match signifies that a repetition has been found in the pattern. If the match doesn't work you have a valid set of data.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838376

You could use negative lookaheads:

^(?:word1(?!.*word1)|word2(?!.*word2)|word3(?!.*word3))+$

See it working online: rubular

Upvotes: 10

Related Questions