verifying formatted messages

Question

I'm writing software which does some analysis of the input and returns a result. Part of the requirements includes it generates zero or more warnings or errors and includes those with the result. I'm also writing unit tests which, in particular, have some contrived data to verify the right warnings are emitted.

I need to be able to parse the warnings/errors and verify that the expected messages are correctly emitted. I figured I'd store the messages in a container and reference them with a message ID which is pretty similar to how I've done localization in the past.

errormessages.py right now looks pretty similar to:

from enum import IntEnum
NO_MESSAGE = ('')
HELLO = ('Hello, World')
GOODBYE = ('Goodbye')

class MsgId(IntEnum):
    NO_MESSAGE = 0
    HELLO = 1
    GOODBYE = 2

Msg = {
    MessageId.NO_MESSAGE: NO_MESSAGE,
    MessageId.HELLO: HELLO,
    MessageId.GOODBYE: GOODBYE,
}

So then the analysis can look similar to this:

from errormessages import Msg, MsgId
def analyse(_):
    errors = []
    errors.append(Msg[MsgId.HELLO])
    return _, errors

And in the unit tests I can do something similar to

from errormessages import Msg, MsgId
from my import analyse
def test_hello():
    _, errors = analyse('toy')
    assert Msg[MsgId.HELLO] in errors

But some of the messages get formatted and I think that's going to play hell with parsing the messages for unit tests. I was thinking I'd add flavors of the messages; one for formatting and the other for parsing:

updated errormessages.py:

from enum import IntEnum
import re
FORMAT_NO_MESSAGE = ('')
FORMAT_HELLO = ('Hello, {}')
FORMAT_GOODBYE = ('Goodbye')

PARSE_NO_MESSAGE = re.compile(r'^$')
PARSE_HELLO = re.compile(r'^Hello, (.*)$')
PARSE_GOODBYE = re.compile('^Goodbye$')

class MsgId(IntEnum):
    NO_MESSAGE = 0
    HELLO = 1
    GOODBYE = 2

Msg = {
    MessageId.NO_MESSAGE: (FORMAT_NO_MESSAGE, PARSE_NO_MESSAGE),
    MessageId.HELLO: (FORMAT_HELLO, PARSE_HELLO),
    MessageId.GOODBYE: (FORMAT_GOODBYE, PARSE_GOODBYE),
}

So then the analysis can look like:

from errormessages import Msg, MsgId
def analyse(_):
    errors = []
    errors.append(Msg[MsgId.HELLO][0].format('World'))
    return _, errors

And in the unit tests I can do:

from errormessages import Msg, MsgId
from my import analyse
import re
def test_hello():
    _, errors = analyse('toy')
    expected = {v: [] for v in MsgId}
    expected[MsgId.HELLO] = [
        Msg[MsgId.HELLO][1].match(msg)
        for msg in errors
    ]
    for _,v in expected.items():
        if _ == MsgId.HELLO:
            assert v
        else:
            assert not v

I was wondering if there's perhaps a better / simpler way? In particular, the messages are effectively repeated twice; once for the formatter and once for the regular expression. Is there a way to use a single string for both formatting and regular expression capturing?

Uri Granta · Accepted Answer

Assuming the messages are all stored as format string templates (e.g. "Hello", or "Hello, {}" or "Hello, {firstname} {surname}"), then you could generate the regexes directly from the templates:

import re
import random
import string

def format_string_to_regex(format_string: str) -> re.Pattern:
    """Convert a format string template to a regex."""
    unique_string = ''.join(random.choices(string.ascii_letters, k=24))
    stripped_fields = re.sub(r"\{[^\{\}]*\}(?!\})", unique_string, format_string)
    pattern = re.escape(stripped_fields).replace(unique_string, "(.*)")
    pattern = pattern.replace("\{\{","\{").replace("\}\}", "\}")
    return re.compile(f"^{pattern}$")

def is_error_message(error: str, expected_message: MessageId) -> bool:
    """Returns whether the error plausibly matches the MessageId."""
    expected_format = format_string_to_regex(Msg[expected_message])
    return bool(expected_format.match(error))

verifying formatted messages

Answers (1)

Related Questions