user3239711
user3239711

Reputation: 649

Decode a regex to know if it allows numbers only

I receive a regex as a string. Is it possible to know if the regex only permits number ? The regex I receive are mainly of the form :

But I may receive other regex.

Upvotes: 3

Views: 1121

Answers (2)

degant
degant

Reputation: 4981

Like most people have pointed, this is a particularly complex task to achieve using a simple regex, since there are a lot of ways in which the same thing can be written, including cases where digits are hidden inside character classes, or negated character classes etc. Nonetheless, I gave it a shot and tested it out a bit, it works for basic scenarios.

The below regex matches any regex that matches only digits, and not any other characters. It may allow one or more digits, restrict only particular digits etc. which doesn't really matter. The capturing regex only ensures that the matched regex doesn't match any non-numbers.

  • The regex matches various ways of representing digits including \d, [0-9], \p{N}, [123] and even literals 4 but not negated character classes [^\WA-Za-z_] or [.-:]
  • The regex matches a regex with or without anchor tags
  • The regex includes support for all quantifiers, including *, +, ? and even {x,y}. Also works with non-greedy and possessive quantifiers i.e \d*? and \d*+
  • The regex works with positive or negative lookbehinds and lookaheads both.
  • The regex includes support for | such as \d?|[34]?|123

Limitations:

  • The regex doesn't support capturing or non-capturing groups since that makes it quite complex. So any regex containing (..) capturing group or (:..) non-capturing group will fail even though they might by digit only
  • The regex doesn't support negated character classes. For example: [^\WA-Za-z_] matches only digits, but it won't work.
  • Though this isn't exactly a limitation, but wanted to point out that validation of the regex is NOT done.

Regex:

^\^?((\(\?\<[=!][^\(\)]*?\))?(\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?)+(?:\|(?:(?:\(\?\<[=!][^\(\)]*?\))?(\[\d*(\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?))*\$?$

Regex101 Demo

An easier way to visualise the solution is:

^(lookbehind)?(digit_classes)+(quantifier)?(quantifier_type)?(lookahead)?

lookbehind = (?<=.. or (?<!..
digit_classes = \d or [0-9] or \p{N} etc.
quantifier = * or + or ? or {,}
quantifier_type = ? or +
lookahead = (?=.. or (?!..

// Repeat the above to support 'OR' i.e |

((\(\?\<[=!][^\(\)]*?\))?(\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*)(\*|\+|\?|\{\d*,?\d*\})?(\?|\+)?(\(\?[=!][^\(\)]*?\))?)+ or the first capturing group includes support for all types of digits described in detail below.

  1. First capturing group (\(\?\<[=!][^\(\)]*?\))? includes matching positive or negative look behinds
    • \(\?\< includes the start of a look behind i.e (?< followed by [=!] since it could be positive or negative
    • [^\(\)]*? non-greedily allows any character other than ( or ) to be present in the lookbehind
  2. Next capturing group (\[\d*(?:\d-\d)?\d*\]|\\d|\\p\{N\}|\d+(?:\|\d+)*) includes matching various digit representations such as \d or [0-9] or \p{N}
    • [\d*(?:\d-\d)?\d*\] matches [0-9] or [1234] or even [1-3567]
    • \\d matches \d directly
    • \\p\{N\} matches \p{N} directly
    • \d+(?:\|\d+)* allows literals to be present eg. '4' and support multiple literals too, such as 4|6|8
  3. Next capturing group (\*|\+|\?|\{\d*,?\d*\})? includes matching all quantifiers i.e *, +, ?, {,}.
    • \*|\+|\? represents all the basic quantifiers
    • \{\d*,?\d*\} supports quantifiers specifying minimum and maximum counts such as \d{5,} or [0-9]{3,6} etc.
  4. Next capturing group (\?|\+)? allows support for marking type of quantifier, such as lazy i.e \d*? or possessive i.e \d*+
  5. Next capturing group (\(\?[=!][^\(\)]*?\))? allows positive or negative lookaheads

After this the first capturing group is repeated once more to support using | between multiple digit representations i.e say the above groups are represented by (..)* so to include support for |, it is duplicated likes this (..)+(\|(..))* to come up with the final regex.

Works for:

^[0-9]{6}$
^[0-9]+$
^[0-9]{5,10}$
\d[0][3-9]*?\d[0-7]*?$
\d*|[0-9]+|123
\d+(?!\s)
(?<=\w)[0-9]

Doesn't work for (but should work):

(\d)*          # Capturing groups don't work
(?:\d+)        # Non-capturing groups don't work
^[^\WA-Za-z_]  # Negated character classes don't work

Note: All groups are capturing groups so that visualising them is easier. They can all be converted to non-capturing anytime.

Upvotes: 1

B&#225;lint
B&#225;lint

Reputation: 4039

^(\d|(?<!\^)\d-\d|\\d|\^|\$|\[|\]|{\d+(,\d+)?}|\+|\*|\\b|\\B|\\\d|\(\?[:=!<][^]+\)|\?|\||\((\d|(?<!\^)\d-\d|\\d|\^|\$|\[|\]|{\d+(,\d+)?}|\+|\*|\\b|\\B|\\\d|\(\?[:=!<][^]+\)|\?|\|)+\))+$

I know...I know

This only matches stuff, that could be in a regular expression, that matches digits. This includes (?=My phone number is: )[\d-]+, which matches 123-4567-890 in My phone number is: 123-4567-890.

To test whether a RegEx only matches digits, try matching it with this. If it matches anything, then it's okay.

This doesn't catch invalid ones, e.g. \d^\d$\d

If you notice any errors in it, then please let me know, and I'll correct it.

Upvotes: 0

Related Questions