Sol
Sol

Reputation: 889

Match multiple characters without repetion on a regular expression

I'm using PHP's PCRE, and there is one bit of the regex I can't seem to do. I have a character class with 5 characters [adjxz] which can appear or not, in any order, after a token (|) on the string. They all can appear, but they can only each appear once. So for example:

 *|ad     - is valid
 *|dxa    - is valid
 *|da     - is valid
 *|a      - is valid
 *|aaj    - is *not* valid
 *|adjxz  - is valid
 *|addjxz - is *not* valid

Any idea how I can do it? a simple [adjxz]+, or even [adjxz]{1,5} do not work as they allow repetition. Since the order does not matter also, I can't do /a?d?j?x?z?/, so I'm at a loss.

Upvotes: 0

Views: 108

Answers (3)

alpha bravo
alpha bravo

Reputation: 7948

I suggest using reverse logic where you match the unwanted case using this pattern
\|.*?([adjxz])(?=.*\1)
Demo

Upvotes: 0

Wouter J
Wouter J

Reputation: 41934

I think you should break this in 2 steps:

  1. A regex to check for unexpected characters
  2. A simple PHP check for duplicated characters
function strIsValid($str) {
    if (!preg_match('/^\*|([adjxz]+)$/', $str, $matches)) {
        return false;
    }

    return strlen($matches[1]) === count(array_unique(str_split($matches[1])));
}

Upvotes: 1

p.s.w.g
p.s.w.g

Reputation: 149020

Perhaps using a lookahead combined with a backreference like this:

\|(?![adjxz]*([adjxz])[adjxz]*\1)[adjxz]{1,5}

demonstration

If you know these characters are followed by something else, e.g. whitespace you can simplify this to:

\|(?!\S*(\S)\S*\1)[adjxz]{1,5}

Upvotes: 1

Related Questions