Matthias
Matthias

Reputation: 235

Regex Difference Python2 and Python3

I want to port this code from python2 to python3

p = re.compile(ur"(?s)<u>(.*?)<\/u>")
subst = "\underline{\\1}"
raw_html = re.sub(p, subst, raw_html)

I figured out already that the ur shall be changed to just r:

p = re.compile(r"(?s)<u>(.*?)<\/u>")
subst = "\underline{\\1}"
raw_html = re.sub(p, subst, raw_html)

however it does not work it complains about this:

cd build && PYTHONWARNINGS="ignore" python3 ../src/katalog.py --katalog 1
Traceback (most recent call last):
  File "src/katalog.py", line 11, in <module>
    from common import *
  File "src/common.py", line 207
    subst = "\underline{\\1}"
                             ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
make: *** [katalog1] Error 1

however changing it to "\underline" does not help either. It is not replacing it then.

Upvotes: 2

Views: 86

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

import re
raw_html = r"<u>1</u> and <u>2</u>"
p = re.compile(r"(?s)<u>(.*?)</u>")
subst = r"\\underline{\1}"
raw_html = re.sub(p, subst, raw_html)
print(raw_html)

See Python proof, the results are \underline{1} and \underline{2}. Basically, inside replacement, use double backslash to replace with a single backslash. Use raw string literals to make life easier with regex in Python.

Upvotes: 2

Related Questions