vinnie
vinnie

Reputation: 45

pytest: assert escaped characters with re.escape() fails

I have a function that returns characters using re.escape() method. In the empirical tests it seems to work, I wanted to test it with pytest. But I couldn't get the tests to work, so after a few attempts I tried something like that:

    def test_escape():
>       assert re.escape('!') == "\\!"
E       AssertionError: assert '!' == '\\!'
E         - !
E         + \!

test/test_e.py:6: AssertionError

I also tested it with the interpreter, who works without any problem:

>>> re.escape('!') == '\\!'
True

disabling the capturing of the output of pytest with "-s" and trying to print the output of re.escape('!') I get "!" and not "\!", which does not happen on the interpreter.

I tried to monkeypatch re.escape by forcing "\!" as the output and it magically works. This obviously does not solve my problem but highlights some kind of problem unknown to me with re.escape

@pytest.fixture
def mock_escape(monkeypatch):
    monkeypatch.setattr(re, "escape", lambda x: "\\!")

def test_escape(mock_escape):
    assert re.escape('!') == "\\!"

...

test/test_e.py .

======================================== 1 passed in 0.07s =========================================
all test passed

Just for curiosity I did the same thing with my original function (without monkeypatching but editing its return) and even in this case it works. So it is not a problem that happens because of the import.

# EDIT: # as tmt discovered, it's a problem with the python or pytest version. The problem occurs with python 3.7.2 and pytest 5.2.1. The problem does NOT occur with python 3.6.3 and pytest 4.5.0 So it is almost certainly a bug (in my opinion more easily of pytest) As guy reply, it is simply a behaviour change of re.escape()

Upvotes: 3

Views: 2914

Answers (1)

Guy
Guy

Reputation: 50899

If you look at re.py you will see escape() is using defined special characters list

_special_chars_map = {i: '\\' + chr(i) for i in b'()[]{}?*+-|^$\\.&~# \t\n\r\v\f'}

def escape(pattern):
    """
    Escape special characters in a string.
    """
    if isinstance(pattern, str):
        return pattern.translate(_special_chars_map)
    else:
        pattern = str(pattern, 'latin1')
        return pattern.translate(_special_chars_map).encode('latin1')

and ! is not included there, so re.escape('!') return !, not \!.

assert re.escape('[') == '\\['

for example will work.

Update:

This answer is for Python 3.7, it works on Python 3.6. Pull request #1007 changed escape() pull source code

re.escape() now escapes only special characters.

Previous version:

_alphanum_str = frozenset("_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")
_alphanum_bytes = frozenset(b"_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")

def escape(pattern):
    if isinstance(pattern, str):
        alphanum = _alphanum_str
        s = list(pattern)
        for i, c in enumerate(pattern):
            if c not in alphanum:
                if c == "\000":
                    s[i] = "\\000"
                else:
                    s[i] = "\\" + c
        return "".join(s)
    else:
        alphanum = _alphanum_bytes
        s = []
        esc = ord(b"\\")
        for c in pattern:
            if c in alphanum:
                s.append(c)
            else:
                if c == 0:
                    s.extend(b"\\000")
                else:
                    s.append(esc)
                    s.append(c)
        return bytes(s)

It was modified on Apr 13 2017, so looking on versions history re.escape('!') == '\\!' should work on Python 3.6 and older versions.

Upvotes: 2

Related Questions