d3ky
d3ky

Reputation: 31

How to match range of numbers excluding first digit

Could anybody help me to create regular expression which will match range of numbers not including first digit from range. Problem looks something like like this:

([1-9]) some other meta characters [\1-9]

How can I match digit from range not including number stored in \1?

EXAMPLE: I would like to find numbers which match following rule XZ0XYYXZ000X where X, Y and Z are numbers between 1 and 9 (0 < x < Y < Z).

EXAMPLE2: I have file which contains a lot of lines with some random numbers:

2720337
3730447
1362874
etc.

Now, I want to extract these lines (for example with grep) which match certain criteria (for example numbers 2720337 and 3730447 match criteria XZX0YYZ, where X, Y and Z are numbers between 1 and 9 in following relation X < Y < Z and 0 is zero). My catch was something like this ([1-9])([\1-9])\1(0)([\1-\2])\3\2, but I can not find way to omit greatest and lowest value from range [\1-\2] or lowest from [\1-9]

Upvotes: 0

Views: 2347

Answers (3)

Alan Moore
Alan Moore

Reputation: 75222

This regex enforces the uniqueness of X, Y and Z:

([1-9])((?!\1)[1-9])\10((?!\1|\2)[1-9])\3\2

...but there's no way to enforce their ordering with a regex.


About the regex:

([1-9]) captures the first digit in group #1. That's the first X in your template.

((?!\1)[1-9]) captures the second digit in group #2, but only after the negative lookahead confirms that it isn't the same as the first digit. That's the Z value.

\1 matches the third digit, assuming it's the same as the first digit.

0 is obvious

((?!\1|\2)[1-9]) represents the Y value, so we have to confirm that it's not the same as either of the other two captures. It's captured in group #3.

\3 matches the same digit again; that's the second Y.

\2 matches another of whatever the Z value was, and Bob's your uncle!

Getting back to that 0 again, there's one caveat that I overlooked. If there happen to be ten or more capturing groups in the regex, \10 could be interpreted as a backreference to group #10. It's a good idea to break up that kind of thing whether it needs it or not.

Many regex flavors provide alternative notation that isolates the group reference, like \g<1> or ${1}. Not knowing what flavor you're using, I'll use square brackets to isolate the zero instead (i.e., turn it into a single-element character class):

([1-9])((?!\1)[1-9])\1[0]((?!\1|\2)[1-9])\3\2

Upvotes: 1

Leif
Leif

Reputation: 2160

Ok, let's give this a try... finally. If your second example means the numbers have the same pattern considering their same-digit-at-a-place-properties, you could at least use a regex to first check this:

([1-9])([1-9])(\1)0([1-9])\4\2

This will match 2720337 and 3730447.

The regex captures some parts. Check, if $1 < $4 and $4 < $2 and you're done. If I understood you correctly, that is.

Upvotes: 0

ahven
ahven

Reputation: 89

I'll assume you're matching a string XY, where 0 < X < Y <= 9. You can easily extend it to your needs.

Unfortunately, it is not possible to use a back-reference in a character class. The only way I know to make it is by explicitly writing a case for each value of X: 1[2-9]|2[3-9]|3[4-9]|4[5-9]|5[6-9]|6[7-9]|7[89]|89.

It would be possible (ex. using negative look-ahead) to make sure that Y does not equal X, as in: ([1-9])(?!\1)[1-9], but this does not make sure Y is not less than X.

Upvotes: 0

Related Questions