Python regular expressions: match a range of numbers with a separator

Question

I would like the user to feed a range of numbers to be printed: E.g:

27-84

I came up with the following regex:

^(?P\d+)((-?)(?P\d+)?)$

So it will match: 27-84 as the following groups:

Begin   [0-2]   `27`
2.  [2-5]   `-84`
3.  [2-3]   `-`
end [3-5]   `84`

Seems OK, but I would like to hear if there is a more elegant way, or if there are cavities I am overseeing.

clarification:

A single number like : 27 is allowed.
A single number with 27-, should be understood as 27 to end.
A single number with -10, should be understood as begin to 10.
A sequence with 10-27 should be understood as 10 to 27 as inclusive.

poke · Accepted Answer

(Original answer below)

Okay, with your clarification update, you won’t be able to make that that easily with regular expressions, unless you also accept just a - as “from beginning to end”. In that case, you could use this:

>>> r = re.compile('^((?P\d+)|(?P\d+)?-(?P\d+)?)$')
>>> r.match('123').groupdict()
{'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'begin': None, 'fixed': None, 'end': '456'}
>>> r.match('-').groupdict()
{'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'fixed': None, 'end': '456'}

Otherwise, you will have to use more explicit cases:

>>> r = re.compile('^((?P\d+)|-(?P\d+)|(?P\d+)(?:-(?P\d+)?))$')
>>> r.match('123').groupdict()
{'endonly': None, 'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'endonly': '456', 'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': '456'}
>>> r.match('-') is None
True

As you can see, the amount of work necessary to correctly interpret the results afterwards increases in complexity.

However, I would recommend you to go away from regular expressions for this though, and do the parsing directly in Python. For example like this:

def parseRange (expr):
    if not expr or not re.match('^\d*-?\d*$', expr):
        raise ValueError('Not a correct range')

    # single number
    if '-' not in expr:
        return int(expr), int(expr)
    else:
        begin, end = expr.split('-')
        begin = float('-inf') if begin == '' else int(begin)
        end = float('inf') if end == '' else int(end)
        return begin, end

Used like this:

>>> parseRange('123')
(123, 123)
>>> parseRange('123-')
(123, inf)
>>> parseRange('-456')
(-inf, 456)
>>> parseRange('123-456')
(123, 456)

Original answer

You can just get rid of some of those capturing groups to simply it:

^(?P\d+)-(?P\d+)$

This also fixes the problem that the - could be there without a second number and that theoretically (not practically) begin and end could be separated by nothing.

If you don’t always want to match a range but also want to allow a single number, you can put the range part in a non-capturing group. That way you don’t end up with additional groups:

^(?P\d+)(?:-(?P\d+))?$

And finally, if you want to accept to as a separator as well, you can even do that:

^(?P\d+)(?:(?:-|\s+to\s+)(?P\d+))?$

>>> r = re.compile('^(?P\d+)(?:-(?P\d+))?$')
>>> r.match('123').groupdict()
{'begin': '123', 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}

>>> r = re.compile('^(?P\d+)(?:(?:-|\s+to\s+)(?P\d+))?$')
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}
>>> r.match('123 to 456').groupdict()
{'begin': '123', 'end': '456'}

Python regular expressions: match a range of numbers with a separator

clarification:

Answers (2)

Original answer

Related Questions