Add a non escaped escape character to python bytearray

Question

I have an API that is demanding that the quotation marks in my XML attributes are escaped, so will not work, it requires .

I have tried iterating through my string, for example:

b'
SetChLevelC30'

Each time that I encounter a " (ascii 34) I will replace it with an escape character (ascii 92) and another quote. Infuriatingly this results in:

b'
SetChLevelC30'

where the escapes have been escaped. As a sanity check I replaced 92 with any other character and it works as expected.

temp = b'
\
SetChLevelC30'

i = 0
j = 0
payload = bytearray(len(temp) + 4)

for char in temp:
    if char == 34:
        payload[i] = 92
        i += 1
        payload[i] = 34
        i += 1
        j += 1
    else:
        payload[i] = temp[j]
        i += 1
        j += 1

print(bytes(payload))

I would assume that character 92 would appear once but something is escaping the escape!

Grismar · Accepted Answer

Your problem is the result of a very common misunderstanding for programmers new to Python.

When printing a string (or bytes) to the console, Python escapes the escape character (\) to show a string that, when used in Python as a literal, would give you the exact same value.

So:

s = 'abc\abc'
print(s)

Prints abc\abc, but on the interpreter you get:

>>> s = 'abc\abc'
>>> print(s)
abc\abc
>>> s
'abc\abc'

Note that this is correct. After all print(s) should show the string on the console as it is, while s on the interpreter is asking Python to show you the representation of s, which includes the quotes and the escape characters.

Compare:

>>> repr(s)
"'abc\\abc'"

repr here prints the representation of the representation of s.

For bytes, things are further complicated because the representation is printed when using print, since print prints a string and a bytes needs to be decoded first, i.e.:

>>> print(some_bytes.decode('utf-8'))  # or whatever the encoding is

In short: your code was doing what you wanted it to, it does not duplicate escape characters, you only thought it did because you were looking at the representation of the bytes, not the actual bytes content.

By the way, this also means that you don't have to be paranoid and go through the trouble of writing custom code to replace characters based on their ASCII values, you can simply:

>>> example = bytes('test', encoding='utf-8')
>>> result = example.replace(b'"', b"\"")
>>> print(result.decode('utf-8'))
test

I won't pretend that b"\"" is intuitive, perhaps b'\"' is better - but both require that you understand the difference between the representation of a string, or its printed value.

So, finally:

>>> example = b'test'
>>> result = example.replace(b'"', b'\"')
>>> print(result.decode('utf-8'))
test

Add a non escaped escape character to python bytearray

Answers (1)

Related Questions