Reputation: 23

regex lookahead problems with greedy quantifier

Need to support the following formats

3 digits followed by optional space followed by three non-repeating characters specified within the following character set ACERV (space is valid only between the two characters)

Valid formats:

123
123 A
123 A v
123 CER

Invalid formats:

123A
123 AA
123 A  - when followed by a space

What I have so far - I might be over complicating this with lookaheads that are not necessarily required:

^([0-9]{3})                                         # - first 3 digits
 (\s(?=[ACERV]))([ACERV])                           # - allow space only when followed by ACERV
 (?!\3)(?=[ACERV ]{0,1})([ACERV ]{0,1})             # - do not allow 1st char to repeat
 (?!\3)                                             # - do not allow 1st char to repeat
 (?!\4)                                             # - do not allow 2nd to repeat
 (?!\s)                                             # - do not allow trailing space
 (?=[ACERV]{0,1})([ACERV]{0,1})|[0-9]{3}$

When the lookahead (?!\4) is added it fails to match on the valid format 123 A - modifying the quantifier on (?!\4) to (?!\4)* or (?!\4)? allows 123 A to match but allows 1st or 2nd char to be repeated.

Upvotes: 2

Answers (3)

user557597

Reputation:

Not totally sure of the requirements, this works on your samples.

 # ^(?i)\d{3}(?:[ ](?:([ACERV])[ ]?(?![ACERV ]*\1)){1,3}(?<![ ]))?$

 ^                      # BOL
 (?i)                   # Case insensitive modifier
 \d{3}                  # 3 digits
 (?:                    # Cluster grp, character block (optional)
      [ ]                    # Space, required
      (?:                    # Cluster grp
           ( [ACERV] )            # (1), Capture single character [ACERV]
           [ ]?                   # [ ], optional
           (?!                    # Negative lookahead
                [ACERV ]*              # As many [ACERV] or [ ] needed
                \1                     # to find what is captured in group 1
                                       # Found it, the assertion fails
           )                      # End Negative lookahead
      ){1,3}                 # End Cluster grp, gets 1-3 [ACERV] characters
      (?<! [ ] )             # No dangling [ ] at end
 )?                     # End Cluster grp, character block (optional)
 $                      # EOL

update - Adjusted to replace lookbehind.

 # ^(?i)\d{3}(?!.*[ ]$)(?:[ ](?:([ACERV])[ ]?(?![ACERV ]*\1)){1,3})?$

 ^                      # BOL
 (?i)                   # Case insensitive modifier
 \d{3}                  # 3 digits
 (?! .* [ ] $ )         # No dangling [ ] at end
 (?:                    # Cluster grp, character block (optional)
      [ ]                    # Space, required
      (?:                    # Cluster grp
           ( [ACERV] )            # (1), Capture single character [ACERV]
           [ ]?                   # [ ], optional
           (?!                    # Negative lookahead
                [ACERV ]*              # As many [ACERV] or [ ] needed
                \1                     # to find what is captured in group 1
                                       # Found it, the assertion fails
           )                      # End Negative lookahead
      ){1,3}                 # End Cluster grp, gets 1-3 [ACERV] characters
 )?                     # End Cluster grp, character block (optional)
 $                      # EOL

Upvotes: 1

user663031

Reputation:

One plan is a simple regexp to pull apart the string, then a second step to validate that the characters are not repeating.

// check all characters in a string are unique,
// by ensuring that each character is its own first appearance
function unique_characters(str) {
    return str.split('').every(function(chr, i, chrs) {
        return chrs.indexOf(chr) === i;
    });
}

// check that the code is valid
function valid_code(str) {
    var spacepos = str.indexOf(' ');
    return unique_characters(str) &&
        (spacepos === -1 || (spacepos === 1 && str.length === 3));
}

// check basic format and pull out code portion
function check_string(str) {
    var matches = str.match(/^\d{3} ?([ACERV ]{0,3})$/i);
    valid = matches && valid_code(matches[1]);
    return valid;
}

>> inputs = ['123', '123 A', '123 A v', '123 CER', '123A', '123 AA', '123 A ']
[true, true, true, true, true, false, false]

The fourth test case shows as valid, because if the space is indeed optional, then there if 123 A is valid then it would seem 123A would be valid as well.

A possible advantage of this kind of approach is that if additional validation rules are introduced, they can be implemented more easily than mucking around inside a huge regexp.

Upvotes: 0

nu11p01n73R

Reputation: 26667

How about the regex

^\d{3}(?:$|\s)(?:([ACERV])(?!\1)|\s(?!$|\1))*$

will match the strings

123
123 A
123 A V
123 CER

See how the regex mathces at http://regex101.com/r/mW5qZ9/9

^ anchors the regex at the begining of the string
\d{3} matches 3 occurence of any digit
(?:$|\s) matches end of string $ or space , \s
(?:\s?([ACERV])(?!\1)){0,3} matches non repeating character from [ACERV]
- (?: ) non capturing group
- \s? optional space
- ([ACERV]) matches the characters in the class
- (?:([ACERV])(?!\1)|\s(?!$|\1)) asserts that the regex is not followed by \1, recently captured character. ensures that the characters are non repeating.
  - (?!\1) asserts that the character class cannot be followed by repeating character
  - \s(?!$|\1)) asserts that if it is a space, then it cannot be followed by an end of string or repeating character from \1
- {0,3} quantifier specifying the minimum occurence as zero and maximum occurence as 3
$ anchors the regex at the end of the string.

Upvotes: 0

regex lookahead problems with greedy quantifier

Answers (3)

Related Questions