projetmbc
projetmbc

Reputation: 1452

Odd or even number of backslashes and escaped character

I have a little problem with the following code.

import re

pattern = re.compile(r"((?:^|[^\\@]|\\.)+)@")

for text in [
    r"ok@\@.py",
    r"ok@\\@.py",
    r"ok@\\\@.py",
    r"ok@\\\\@.py",
    r"ok@\\\\\@.py",
]:
    search = re.search(pattern, text)
    print('---', text, sep="\n")

    if search:
        print(pattern.sub(r"\1<star>", text))

    else:
        print('<< NOTHING FOUND ! >>')

This prints :

---
ok@\@.py
ok<star>\@.py
---
ok@\\@.py
ok<star>\\<star>.py
---
ok@\\\@.py
ok<star>\\\<star>.py
---
ok@\\\\@.py
ok<star>\\\\<star>.py
---
ok@\\\\\@.py
ok<star>\\\\\<star>.py

The problem starts with the 3rd output that is wrong because there is first an escaped backslash and then the escaped character @. The problem continues with more backslashes : just see the last output with two escaped backslashes and then the escaped character @..

Here is the expected output where the @ is indeed escaped only when there is an odd number of \ before it.

---
ok@\@.py
ok<star>\@.py
---
ok@\\@.py
ok<star>\\<star>.py
---
ok@\\\@.py
ok<star>\\\@.py
---
ok@\\\\@.py
ok<star>\\\\<star>.py
---
ok@\\\\\@.py
ok<star>\\\\\@.py

What is wrong in my regex and how to fix it ?

Upvotes: 3

Views: 448

Answers (1)

karthik manchala
karthik manchala

Reputation: 13640

Use the following regex:

pattern = re.compile(r"(?<!\\)((?:\\\\)*)@")

And replace with just <star>

Output:

ok<star>\@.py 
ok<star>\\<star>.py
ok<star>\\\@.py
ok<star>\\\\<star>.py
ok<star>\\\\\@.py

See DEMO

Upvotes: 2

Related Questions