bl79
bl79

Reputation: 1429

Replace backslash '\' in string

I have a plain string 'бекслеш \018 на точку' in Python 3. I got this string from an external HTML page, therefore it doesn't have the "r" prefix of a raw string. I don't know how to convert it to a raw string.

How can I replace the '\' with a dot '.'?

I've tried the following:

s = get_string()  # 'бекслеш \018 на точку'
print(s.replace('\\', '.'))
out: бекслеш 8 на точку

But I need 'бекслеш .018 на точку'.

UPD: It is clear that the programming language interprets the backslash as a control character. Question: how to make a replacement, if it is not possible to specify a string as raw, or is it not clear how to convert it to raw?

Upvotes: 0

Views: 610

Answers (2)

pylang
pylang

Reputation: 44505

I think you actually want to replace the control character:

Code

print(s.replace("\x01", ".01"))
# бекслеш .018 на точку

Details

It is clear that the programming language interprets the backslash as a control character.

Actually the control character includes the escape character (\) and the adjacent code (01). Let's see how Python looks at each character:

print(list(s))
# ['б', 'е', 'к', 'с', 'л', 'е', 'ш', ' ', '\x01', '8', ' ', 'н', 'а', ' ', 'т', 'о', 'ч', 'к', 'у']

Notice \x01 is one character, not the backslash alone. You have to replace this entire character.


Addendum

Therefore, a general approach can be to iterate each character and substitute any that belong to the control character category with a new string. This new string should be formatted to mirror the value of the character it replaces. Otherwise, return a normal character.

from unicodedata import category


"".join(".{:02d}".format(ord(char)) if category(char).startswith("C") else char for char in s)
# 'бекслеш .018 на точку'

Upvotes: 2

Olivier Melançon
Olivier Melançon

Reputation: 22314

The difference between a string literal and a raw string is the way they are interpreted to create a string object from your source code. The objects they create are not distinct in any way. So there is no such thing as converting a string to a raw string.

In this case, '\018' stands for '\x01', which is the Start-of-Header character, followed by the character '8'.

chr(1) + '8' == '\x018' # True

And as you can see, your string contains no '\\' character.

'\\' in 'бекслеш \018 на точку' # False

Upvotes: 3

Related Questions