Reputation: 994
I think my chances are slim based on the responses to other questions related to regular expressions.
I am try to parse numbers in different representations:
12345(234567)
12345(234.56K)
from which I cannot control the source format.
I suppose I can come up with different regular expression for different formats. How to detect which format is which? Has it to be brute-force way to look for the letter 'K'?
Upvotes: 2
Views: 716
Reputation: 8585
This kind of thing is often done by iterating over a bunch of regular expressions and stopping when you find one that matches - because your conversion from a string to a number needs special parsing beyond the capabilities of regular expressions. That means you need to order them in a way that you know will give the right answer. In this case, you might do something like this:
PARSERS = (
(re.compile(r'([0-9]+)\(([-+0-9.]+)[mM]\)'), 1000000),
(re.compile(r'([0-9]+)\(([-+0-9.]+)[kK]\)'), 1000),
(re.compile(r'([0-9]+)\(([-+0-9.]+)\)'), 1),
)
def parse(num):
for pattern, multiplier in PARSERS:
match = pattern.match(num)
if match is not None:
return float(match.group(1)), float(match.group(2)) * multiplier
raise ValueError("Failed to parse")
As an aside, this pattern is common in other places too, such as deciding which function will handle a web request based on the URL.
Just for fun, here's an alternative implementation that uses dictionary lookups and a single regular expression instead of iteration:
MULTIPLIER = {
'M': 1000000,
'K': 1000,
'': 1,
}
PATTERN = re.compile(r'(\d+)\(([-+.\d]+)([kKmM]?)\)')
def parse(num):
match = PATTERN.match(num)
if match is None:
raise ValueError("Failed to parse")
first, second, suffix = match.groups()
suffix = suffix.upper()
if suffix not in MULTIPLIER:
raise ValueError("Unrecognised multiplier %s" % suffix)
return float(first), float(second) * MULTIPLIER[suffix]
Upvotes: 3