Reputation: 31
Could anybody help me to create regular expression which will match range of numbers not including first digit from range. Problem looks something like like this:
([1-9])
some other meta characters [\1-9]
How can I match digit from range not including number stored in \1
?
EXAMPLE:
I would like to find numbers which match following rule XZ0XYYXZ000X
where X
, Y
and Z
are numbers between 1 and 9 (0 < x < Y < Z
).
EXAMPLE2: I have file which contains a lot of lines with some random numbers:
2720337
3730447
1362874
etc.
Now, I want to extract these lines (for example with grep) which match certain criteria (for example numbers 2720337
and 3730447
match criteria XZX0YYZ
, where X
, Y
and Z
are numbers between 1 and 9 in following relation X < Y < Z
and 0 is zero). My catch was something like this ([1-9])([\1-9])\1(0)([\1-\2])\3\2
, but I can not find way to omit greatest and lowest value from range [\1-\2]
or lowest from [\1-9]
Upvotes: 0
Views: 2347
Reputation: 75222
This regex enforces the uniqueness of X
, Y
and Z
:
([1-9])((?!\1)[1-9])\10((?!\1|\2)[1-9])\3\2
...but there's no way to enforce their ordering with a regex.
About the regex:
([1-9])
captures the first digit in group #1. That's the first X
in your template.
((?!\1)[1-9])
captures the second digit in group #2, but only after the negative lookahead confirms that it isn't the same as the first digit. That's the Z
value.
\1
matches the third digit, assuming it's the same as the first digit.
0
is obvious
((?!\1|\2)[1-9])
represents the Y
value, so we have to confirm that it's not the same as either of the other two captures. It's captured in group #3.
\3
matches the same digit again; that's the second Y
.
\2
matches another of whatever the Z
value was, and Bob's your uncle!
Getting back to that 0
again, there's one caveat that I overlooked. If there happen to be ten or more capturing groups in the regex, \10
could be interpreted as a backreference to group #10. It's a good idea to break up that kind of thing whether it needs it or not.
Many regex flavors provide alternative notation that isolates the group reference, like \g<1>
or ${1}
. Not knowing what flavor you're using, I'll use square brackets to isolate the zero instead (i.e., turn it into a single-element character class):
([1-9])((?!\1)[1-9])\1[0]((?!\1|\2)[1-9])\3\2
Upvotes: 1
Reputation: 2160
Ok, let's give this a try... finally. If your second example means the numbers have the same pattern considering their same-digit-at-a-place-properties, you could at least use a regex to first check this:
([1-9])([1-9])(\1)0([1-9])\4\2
This will match 2720337 and 3730447.
The regex captures some parts. Check, if $1 < $4 and $4 < $2 and you're done. If I understood you correctly, that is.
Upvotes: 0
Reputation: 89
I'll assume you're matching a string XY, where 0 < X < Y <= 9. You can easily extend it to your needs.
Unfortunately, it is not possible to use a back-reference in a character class.
The only way I know to make it is by explicitly writing a case for each value of X:
1[2-9]|2[3-9]|3[4-9]|4[5-9]|5[6-9]|6[7-9]|7[89]|89
.
It would be possible (ex. using negative look-ahead) to make sure that Y does not equal X, as in: ([1-9])(?!\1)[1-9]
, but this does not make sure Y is not less than X.
Upvotes: 0