oz123
oz123

Reputation: 28858

Python regular expressions: match a range of numbers with a separator

I would like the user to feed a range of numbers to be printed: E.g:

27-84

I came up with the following regex:

^(?P<Begin>\d+)((-?)(?P<end>\d+)?)$

So it will match: 27-84 as the following groups:

Begin   [0-2]   `27`
2.  [2-5]   `-84`
3.  [2-3]   `-`
end [3-5]   `84`

Seems OK, but I would like to hear if there is a more elegant way, or if there are cavities I am overseeing.

clarification:

Upvotes: 2

Views: 395

Answers (2)

poke
poke

Reputation: 387745

(Original answer below)

Okay, with your clarification update, you won’t be able to make that that easily with regular expressions, unless you also accept just a - as “from beginning to end”. In that case, you could use this:

>>> r = re.compile('^((?P<fixed>\d+)|(?P<begin>\d+)?-(?P<end>\d+)?)$')
>>> r.match('123').groupdict()
{'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'begin': None, 'fixed': None, 'end': '456'}
>>> r.match('-').groupdict()
{'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'fixed': None, 'end': '456'}

Otherwise, you will have to use more explicit cases:

>>> r = re.compile('^((?P<fixed>\d+)|-(?P<endonly>\d+)|(?P<begin>\d+)(?:-(?P<end>\d+)?))$')
>>> r.match('123').groupdict()
{'endonly': None, 'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'endonly': '456', 'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': '456'}
>>> r.match('-') is None
True

As you can see, the amount of work necessary to correctly interpret the results afterwards increases in complexity.

However, I would recommend you to go away from regular expressions for this though, and do the parsing directly in Python. For example like this:

def parseRange (expr):
    if not expr or not re.match('^\d*-?\d*$', expr):
        raise ValueError('Not a correct range')

    # single number
    if '-' not in expr:
        return int(expr), int(expr)
    else:
        begin, end = expr.split('-')
        begin = float('-inf') if begin == '' else int(begin)
        end = float('inf') if end == '' else int(end)
        return begin, end

Used like this:

>>> parseRange('123')
(123, 123)
>>> parseRange('123-')
(123, inf)
>>> parseRange('-456')
(-inf, 456)
>>> parseRange('123-456')
(123, 456)

Original answer

You can just get rid of some of those capturing groups to simply it:

^(?P<begin>\d+)-(?P<end>\d+)$

This also fixes the problem that the - could be there without a second number and that theoretically (not practically) begin and end could be separated by nothing.

If you don’t always want to match a range but also want to allow a single number, you can put the range part in a non-capturing group. That way you don’t end up with additional groups:

^(?P<begin>\d+)(?:-(?P<end>\d+))?$

And finally, if you want to accept to as a separator as well, you can even do that:

^(?P<begin>\d+)(?:(?:-|\s+to\s+)(?P<end>\d+))?$
>>> r = re.compile('^(?P<begin>\d+)(?:-(?P<end>\d+))?$')
>>> r.match('123').groupdict()
{'begin': '123', 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}

>>> r = re.compile('^(?P<begin>\d+)(?:(?:-|\s+to\s+)(?P<end>\d+))?$')
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}
>>> r.match('123 to 456').groupdict()
{'begin': '123', 'end': '456'}

Upvotes: 6

Tim Pietzcker
Tim Pietzcker

Reputation: 336178

To allow and correctly handle all the cases you mentioned in your edit, you need a different regex:

^(?=.*\d)(?P<begin>\d*)(?P<range>-?)(?P<end>\d*)$

The lookahead at the start is necessary to assert that there is at least one digit in the string (because begin or end can be optional, but not both).

As a verbose regex with explanations:

^              # Start of string
(?=.*\d)       # Assert that there is at least one digit somewhere
(?P<begin>\d*) # Match 0 or more digits --> Begin
(?P<range>-?)  # Match 0 or 1 dash
(?P<end>\d*)   # Match 0 or more digits --> End
$              # End of string

See it live on regex101.com.

Upvotes: 1

Related Questions