QA Collective
QA Collective

Reputation: 2419

How to store backslash unescaped string in a variable?

A lot of questions regarding unescaping strings seem to be related to Python 2 or for unescaping unicode character codes.

I have a string being returned from LDAP that seems to be 'double escaped':

>>> escaped = "hello\\,world"

I want to unescape this string and store it into another variable, but decode doesn't return what I expect:

>>> escaped.encode().decode('unicode_escape')
'hello\\,world'

The result of a print() however returns what I want:

>>> print(escaped)
hello\,world

I know I can capture the result of that print to an IO stream, but surely there is a more elegant solution than that?

Upvotes: 0

Views: 828

Answers (1)

jsbueno
jsbueno

Reputation: 110228

'hello\\world' is not doubly escaped - it is simply that when showing the internal representation (aka "repr") of strings, Python does escape backslashes so that you, the person viewing this representation know that \\ represent an actual, single, backslash character inside the string, and not a escape sequence for another character.

When you call print, the string representation is done through another method, which is meant for program output - i.e. for the users of the program to consume. On this representation, the "\" is properly rendered as "\", and other sequences, such as "\n", "\t", "\b" are rendered as the real characters they represent ("\x0a", "\x09" and "\x07" in this case - or "LINE FEED", "TAB" and "BACKSPACE").

The former is rendered by Python through the call of the __repr__ method in any object, and it is what any Python interactive environment uses to show the result of expressions. The later rendering, used by print takes place calling an object's __str__ method instead. In code, instead of calling these methods directly, one should call respectively the built-ins repr(...) and str(...).

Also, by using f-strings it is easy to interpolate the desired view of an object in another text-snippet. If you want the "str" view, just place the object as an expression between {} inside the f-string. If the internal representation is desired, before the closing }, include the !r sequence:

In [192]: a = "Hello\world!"                                                                                             

In [193]: a                                                                                                              
Out[193]: 'Hello\\world!'

In [194]: print(a)                                                                                                       
Hello\world!

In [195]: print(repr(a))                                                                                                 
'Hello\\world!'

In [196]: print(f"*{a}*{a!r}*")                                                                                          
*Hello\world!*'Hello\\world!'*

As you can see, even typing a single "\", if the character following it does not form a known escape sequence, the "\" is taken alone - but shown as "\", because we, humans, are in no obligation to know by heart which are valid escape sequences and which are not. On the other hand, typing a single "\" meaning a backlash in literal strings is quite dangerous, as there is a big chance of creating an unintended other character. In Python 3.8 (currently in beta), this even yields syntax warning:

Python 3.8.0b2+ (heads/3.8:028f1d2479, Jul 17 2019, 22:42:16) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "hello\world!"
<stdin>:1: SyntaxWarning: invalid escape sequence \w

The way to avoid this warning is to either always type a double \\ or use the r' prefix for the string:

>>> a = r"hello\world!"

Upvotes: 2

Related Questions