JSK NS
JSK NS

Reputation: 3446

REGEX Repeater "Or" Operator

I am looking to match a regex with either 2 [0-9] repeats (and then some other pattern)

[0-9]{2}[A-z]{4}

OR 6 [0-9] repeats (and then some other pattern)

[0-9]{6}[A-z]{4}

The following is too inclusive:

[0-9]{2,6}[A-z]{4}

QUESTION

Is there a way that I can specify either 2 or 6 repeats?

Upvotes: 1

Views: 468

Answers (4)

Jerry
Jerry

Reputation: 71578

The classic way would be:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

[Literally as [0-9]{2} OR [0-9]{6}]

But you can also use this one, which should be a little more efficient than the above with less potential backtracking:

[0-9]{2}(?:[0-9]{4})?[A-z]{4}

[Here, [0-9]{2} then potential other 4 [0-9] which makes a total of 6 [0-9] in the required conditions]


You might not be aware that [A-z] matches letters and some other characters, but it actually does.

The range [A-z] effectively is equivalent to:

[A-Z\[\\\]^_`a-z]

Notice that the additional characters that match are:

[ \ ] ^ _ `

[spaces included voluntarily for separation but is not part of the characters]

This is because those characters are between the block letters and lowercase letters in the unicode table.

Upvotes: 3

CaffGeek
CaffGeek

Reputation: 22064

This should work

(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

Do you have some test cases I can verify it with.

  • 12asdf - passes
  • 123456asdf - passes
  • 1234asdf - fails

However, if you don't anchor the start of the regex to a word (\b) or line boundary (^), the 1234asdf will have 34asdf as a partial match.

So either

\b(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

or

^(?:[0-9]{2}|[0-9]{6})[a-zA-Z]{4}

As a quick rundown of the regex changes

  • (?: ) creates a non capturing group
  • | selects between the alteratives [0-9]{2} and [0-9]{6}
  • ^ matches the start of a line
  • $ matches the end of a line
  • \b matches a word boundary
  • [a-zA-Z] is being used instead of [A-z] as it's likely what was intended (all alpha characters, regardless of case)

You can also replace your [0-9]s with a \d which is shorthand for any digit. The best way I can think of to right this, and not get partial matches is as follows

(?:\b|^)(?:\d{2}|\d{6})[a-zA-Z]{4}(?:\b|$)

Upvotes: 3

dee-see
dee-see

Reputation: 24078

You can use the or | like this within a non-capturing group:

(?:[0-9]{2}|[0-9]{6})[A-z]{4}

Be aware that using [A-z] doesn't only include lower and upper case letters, but also [, \, ], ^, _, and ' which lie between Z and a in the ASCII code points. Use [A-Za-z] for letters, as pointed out by @AlanMoore in his comment.

Upvotes: 5

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324760

Not obvious, but yes:

(?:\d{2}|\d{6})

Upvotes: 2

Related Questions