aneroid
aneroid

Reputation: 15987

Python error in Console but not in File: unexpected character after line continuation character

I've got a Python script which has a class defined with this method:

@staticmethod
def _sanitized_test_name(orig_name):
    return re.sub(r'[`‘’\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))

I'm able to run the script from the command prompt just fine, without any issues. But when I paste the code of the full class in the console, I get the SyntaxError: unexpected character after line continuation character:

>>> return re.sub(r'[`‘’\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))
  File "<stdin>", line 1
    return re.sub(r'[``'\"]*', '', re.sub(r'[\r\n\/\:\?\<\>\|\*\%]*', '', orig_name.encode('utf-8')))
                                                                                                    ^
SyntaxError: unexpected character after line continuation character

If I skip that method while pasting, it works. Note that there is a difference in what my original line is and what's shown for the error: r'[`‘’\"]*' vs r'[``'"]*'. Replacing that with ur'[`‘’"]*' gives SyntaxError: EOL while scanning string literal.

It seems the Python console is seeing that as a stylised ` (backtick) and the as a sytlised ' (single quote). When I really mean the unicode open and close quotes. I've got # -*- coding: utf-8 -*- at the top of my script, which I paste into the console as well.

Upvotes: 1

Views: 326

Answers (1)

aneroid
aneroid

Reputation: 15987

Focusing on just the expression causing the error r'[`‘’"]*'...

>>> r'[`‘’"]*'
  File "<stdin>", line 1
    r'[``'"]*'
             ^
SyntaxError: EOL while scanning string literal
>>> ur'[`‘’"]*'  # with the unicode modifier
  File "<stdin>", line 1
    ur'[``'"]*'
              ^
SyntaxError: EOL while scanning string literal

If the terminal I'm in doesn't accept unicode input, that interpretation of the unicode chars from to ` and to ', occurs.

So the workaround is to split the regex and use unichr() with the corresponding codes for the two quotes, 2018 and 2019:

>>> r'[`' + unichr(2018) + unichr(2019) + r'"]*'
u'[`\u07e2\u07e3"]*'

(And the raw string modifier r'' probably isn't required for this particular regex.)

Upvotes: 1

Related Questions