oscarloo
oscarloo

Reputation: 171

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.

In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.

I understand the logic of

^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$

however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.

I thought this might work:

^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$

which I interpret as:

However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:

This shouldn't match:

Upvotes: 17

Views: 3539

Answers (2)

melpomene
melpomene

Reputation: 85767

While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.

That said, here's how I'd do it if I had to:

^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$

Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".

We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).

Live demo: https://regex101.com/r/03rJET/3

(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)

Upvotes: 24

CertainPerformance
CertainPerformance

Reputation: 370679

Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).

First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:

^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+

Separated out for better readability:

^(?=\d+$)\d*?
  (?:
    0(?=[1-9])|
    1(?=[2-9])|
    2(?=[3-9])|
    3(?=[4-9])|
    4(?=[5-9])|
    5(?=[6-9])|
    6(?=[7-9])|
    7(?=[89])|
    8(?=9)
  ){4}
\d+

The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)

https://regex101.com/r/03rJET/2

It's ugly, but it works.

Upvotes: 4

Related Questions