xautjzd
xautjzd

Reputation: 309

What's difference between regex{m,n} and (regex){m,n}?

I'm developing a docker project, need to write a regex to check repository name. Requirement as follow:

  1. only include ASCII charactors, exclude upcase.
  2. special charactors exclude except for dot(.), hyphen(-) and underline(_).
  3. only start with alphabet and number and also end with it.
  4. special charactors can't appear continuously.
  5. length limit(min:2, max: 255)

then, my regex is:

([a-z0-9]+(?:[._-][a-z0-9]+)*){2,255}

but, it can't be OK, when repository name is e-e_1.1

When I change it to:

[a-z0-9]+(?:[._-][a-z0-9]+)*{2,255}

it's OK.

Is there someone can explain? Thank you in advance.

Upvotes: 8

Views: 527

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

In the ([a-z0-9]+(?:[._-][a-z0-9]+)*){2,255} regex, the limiting quantifier {2,255} is applied to the whole pattern inside Group 1 ([a-z0-9]+(?:[._-][a-z0-9]+)*). It means it can be repeated 2 to 255 times. It does not mean the whole string length is restricted to 2 to 255 characters.

Now, your [a-z0-9]+(?:[._-][a-z0-9]+)*{2,255} regex can match unlimited characters, too, because the string matched with [a-z0-9]+ can have 1 or more characters. (?:[._-][a-z0-9]+)* can match zero or more characters. The limiting quantifier {2,255} does not work here at all the way you need.

To restrict the length of the input string to 2 to 255 characters, you will have to use a lookahead anchored at the start:

^(?=.{2,255}$)[a-z0-9]+(?:[._-][a-z0-9]+)*$
 ^^^^^^^^^^^^^

The (?=.{2,255}$) lookahead will be executed only once at the beginning of the string and a match will only be found if the condition inside the lookahead is met: there must be 2 to 255 characters (. matches any characters other than a newline, but it is not important as you only allow specific characters in the matching pattern later) up to the end of the string.

Upvotes: 8

Related Questions