Regular Expression for Financial Numbers

Question

I think my chances are slim based on the responses to other questions related to regular expressions.

I am try to parse numbers in different representations:

12345(234567)
12345(234.56K)

from which I cannot control the source format.

I suppose I can come up with different regular expression for different formats. How to detect which format is which? Has it to be brute-force way to look for the letter 'K'?

z0r · Accepted Answer

This kind of thing is often done by iterating over a bunch of regular expressions and stopping when you find one that matches - because your conversion from a string to a number needs special parsing beyond the capabilities of regular expressions. That means you need to order them in a way that you know will give the right answer. In this case, you might do something like this:

PARSERS = (
    (re.compile(r'([0-9]+)$([-+0-9.]+)[mM]$'), 1000000),
    (re.compile(r'([0-9]+)$([-+0-9.]+)[kK]$'), 1000),
    (re.compile(r'([0-9]+)$([-+0-9.]+)$'), 1),
)

def parse(num):
    for pattern, multiplier in PARSERS:
        match = pattern.match(num)
        if match is not None:
            return float(match.group(1)), float(match.group(2)) * multiplier
    raise ValueError("Failed to parse")

As an aside, this pattern is common in other places too, such as deciding which function will handle a web request based on the URL.

Just for fun, here's an alternative implementation that uses dictionary lookups and a single regular expression instead of iteration:

MULTIPLIER = {
    'M': 1000000,
    'K': 1000,
    '': 1,
}
PATTERN = re.compile(r'(\d+)$([-+.\d]+)([kKmM]?)$')

def parse(num):
    match = PATTERN.match(num)
    if match is None:
        raise ValueError("Failed to parse")
    first, second, suffix = match.groups()
    suffix = suffix.upper()
    if suffix not in MULTIPLIER:
        raise ValueError("Unrecognised multiplier %s" % suffix)
    return float(first), float(second) * MULTIPLIER[suffix]

Regular Expression for Financial Numbers

Answers (1)

Related Questions