Reputation: 15987
I've got a Python script which has a class defined with this method:
@staticmethod
def _sanitized_test_name(orig_name):
return re.sub(r'[`‘’\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))
I'm able to run the script from the command prompt just fine, without any issues. But when I paste the code of the full class in the console, I get the SyntaxError: unexpected character after line continuation character
:
>>> return re.sub(r'[`‘’\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))
File "<stdin>", line 1
return re.sub(r'[``'\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))
^
SyntaxError: unexpected character after line continuation character
If I skip that method while pasting, it works. Note that there is a difference in what my original line is and what's shown for the error: r'[`‘’\"]*'
vs r'[``'"]*'
. Replacing that with ur'[`‘’"]*'
gives SyntaxError: EOL while scanning string literal
.
It seems the Python console is seeing ‘
that as a stylised `
(backtick) and the ’
as a sytlised '
(single quote). When I really mean the unicode open and close quotes. I've got # -*- coding: utf-8 -*-
at the top of my script, which I paste into the console as well.
Upvotes: 1
Views: 326
Reputation: 15987
Focusing on just the expression causing the error r'[`‘’"]*'
...
>>> r'[`‘’"]*'
File "<stdin>", line 1
r'[``'"]*'
^
SyntaxError: EOL while scanning string literal
>>> ur'[`‘’"]*' # with the unicode modifier
File "<stdin>", line 1
ur'[``'"]*'
^
SyntaxError: EOL while scanning string literal
If the terminal I'm in doesn't accept unicode input, that interpretation of the unicode chars from ‘
to `
and ’
to '
, occurs.
So the workaround is to split the regex and use unichr()
with the corresponding codes for the two quotes, 2018 and 2019:
>>> r'[`' + unichr(2018) + unichr(2019) + r'"]*'
u'[`\u07e2\u07e3"]*'
(And the raw string modifier r''
probably isn't required for this particular regex.)
Upvotes: 1