Reputation: 28858
I would like the user to feed a range of numbers to be printed: E.g:
27-84
I came up with the following regex:
^(?P<Begin>\d+)((-?)(?P<end>\d+)?)$
So it will match: 27-84
as the following groups:
Begin [0-2] `27`
2. [2-5] `-84`
3. [2-3] `-`
end [3-5] `84`
Seems OK, but I would like to hear if there is a more elegant way, or if there are cavities I am overseeing.
27
is allowed. 27-
, should be understood as 27 to end
.-10
, should be understood as begin to 10
.10-27
should be understood as 10 to 27 as inclusive.Upvotes: 2
Views: 395
Reputation: 387745
(Original answer below)
Okay, with your clarification update, you won’t be able to make that that easily with regular expressions, unless you also accept just a -
as “from beginning to end”. In that case, you could use this:
>>> r = re.compile('^((?P<fixed>\d+)|(?P<begin>\d+)?-(?P<end>\d+)?)$')
>>> r.match('123').groupdict()
{'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'begin': None, 'fixed': None, 'end': '456'}
>>> r.match('-').groupdict()
{'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'fixed': None, 'end': '456'}
Otherwise, you will have to use more explicit cases:
>>> r = re.compile('^((?P<fixed>\d+)|-(?P<endonly>\d+)|(?P<begin>\d+)(?:-(?P<end>\d+)?))$')
>>> r.match('123').groupdict()
{'endonly': None, 'begin': None, 'fixed': '123', 'end': None}
>>> r.match('123-').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': None}
>>> r.match('-456').groupdict()
{'endonly': '456', 'begin': None, 'fixed': None, 'end': None}
>>> r.match('123-456').groupdict()
{'endonly': None, 'begin': '123', 'fixed': None, 'end': '456'}
>>> r.match('-') is None
True
As you can see, the amount of work necessary to correctly interpret the results afterwards increases in complexity.
However, I would recommend you to go away from regular expressions for this though, and do the parsing directly in Python. For example like this:
def parseRange (expr):
if not expr or not re.match('^\d*-?\d*$', expr):
raise ValueError('Not a correct range')
# single number
if '-' not in expr:
return int(expr), int(expr)
else:
begin, end = expr.split('-')
begin = float('-inf') if begin == '' else int(begin)
end = float('inf') if end == '' else int(end)
return begin, end
Used like this:
>>> parseRange('123')
(123, 123)
>>> parseRange('123-')
(123, inf)
>>> parseRange('-456')
(-inf, 456)
>>> parseRange('123-456')
(123, 456)
You can just get rid of some of those capturing groups to simply it:
^(?P<begin>\d+)-(?P<end>\d+)$
This also fixes the problem that the -
could be there without a second number and that theoretically (not practically) begin and end could be separated by nothing.
If you don’t always want to match a range but also want to allow a single number, you can put the range part in a non-capturing group. That way you don’t end up with additional groups:
^(?P<begin>\d+)(?:-(?P<end>\d+))?$
And finally, if you want to accept to
as a separator as well, you can even do that:
^(?P<begin>\d+)(?:(?:-|\s+to\s+)(?P<end>\d+))?$
>>> r = re.compile('^(?P<begin>\d+)(?:-(?P<end>\d+))?$')
>>> r.match('123').groupdict()
{'begin': '123', 'end': None}
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}
>>> r = re.compile('^(?P<begin>\d+)(?:(?:-|\s+to\s+)(?P<end>\d+))?$')
>>> r.match('123-456').groupdict()
{'begin': '123', 'end': '456'}
>>> r.match('123 to 456').groupdict()
{'begin': '123', 'end': '456'}
Upvotes: 6
Reputation: 336178
To allow and correctly handle all the cases you mentioned in your edit, you need a different regex:
^(?=.*\d)(?P<begin>\d*)(?P<range>-?)(?P<end>\d*)$
The lookahead at the start is necessary to assert that there is at least one digit in the string (because begin
or end
can be optional, but not both).
As a verbose regex with explanations:
^ # Start of string
(?=.*\d) # Assert that there is at least one digit somewhere
(?P<begin>\d*) # Match 0 or more digits --> Begin
(?P<range>-?) # Match 0 or 1 dash
(?P<end>\d*) # Match 0 or more digits --> End
$ # End of string
See it live on regex101.com.
Upvotes: 1