narnie
narnie

Reputation: 1772

Convering double backslash to single backslash in Python 3

I have a string like so:

>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'

That I made using a function that converts unicode to the representative Python escape sequences. Then, when I want to convert it back, I can't get rid of the double backslash so that it is interpreted as unicode again. How can this be done?

>>> t = unicode_encode("
>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> print(t)
\u0048\u0065\u006c\u006c\u006f\u0020\u20ac\u0020\u00b0    
>>> t.replace('\\','X')
'Xu0048Xu0065Xu006cXu006cXu006fXu0020Xu20acXu0020Xu00b0'
>>> t.replace('\\', '\\')
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'

Of course, I can't do this, either:

>>> t.replace('\\', '\')
  File "<ipython-input-155-b46c447d6c3d>", line 1
    t.replace('\\', '\')
                         ^
SyntaxError: EOL while scanning string literal

Upvotes: 6

Views: 6849

Answers (3)

dylnmc
dylnmc

Reputation: 4010

Since a backslash is an escape character and you are searching for two backslashes you need to replace four backslashes with two - i.e.:

t.replace("\\\\", "\\")

This will replace every r"\\" with r"\". The r indicates raw string. So, for example, if you type print(r"\\") into idle or any python script (or print r"\\" in Python 2) you will get \\\\. This means that every "\\" is really just a r"\".

user1632861 suggested that you use .replace("\\", ""), but this replaces ever r"\" with nothing. Try the above method instead. :D

In this case, however, it appears as though you are reading/receiving data, and you probably want to use the correct encoding and then decode to unicode (as the person above me suggested).

Upvotes: 0

user1632861
user1632861

Reputation:

You only got one backslash in your code, but backslashes are represent as \\. As you can see, when you use print(), there's only one backslash. So if you want to get rid of one of the two backslashes, don't do anything, it's not there. If you wanna get rid of both, just remove one. Again use \\ to represent one backslash: t.replace("\\", "")

So your string never has two backslashes in the first place, it shouldn't be the problem.

Upvotes: -1

RocketDonkey
RocketDonkey

Reputation: 37249

Not sure if this is appropriate for your situation, but you could try using unicode_escape:

>>> t
'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> type(t)
<class 'str'>
>>> enc_t = t.encode('utf_8')
>>> enc_t
b'\\u0048\\u0065\\u006c\\u006c\\u006f\\u0020\\u20ac\\u0020\\u00b0'
>>> type(enc_t)
<class 'bytes'>
>>> dec_t = enc_t.decode('unicode_escape')
>>> type(dec_t)
<class 'str'>
>>> dec_t
'Hello € °'

Or in abbreviated form:

>>> t.encode('utf_8').decode('unicode_escape')
'Hello € °'

You take your string and encode it using UTF-8, and then decode it using unicode_escape.

Upvotes: 9

Related Questions