UTF-8 to byte representation as a string in Python

Question

I have the following piece of code:

test = "é".encode('utf-8')
print(test)

Now, this would give us: b'\xc3\xa9', as expected. Now I would actually have "\xc3\xa9" as a string. How can I achieve this?

I looked at encoding and decoding methods in Python, but unfortunately they do not result in the desired outcome.

Edwin van Mierlo · Accepted Answer

you can use both repr() or str()

# -*- coding: utf-8 -*-
test = "é".encode('utf-8')
print(test)

# using repr()
my_string = repr(test)[2:-1]
print(my_string)

# using str() 
my_string = str(test)[2:-1]
print(my_string)

output:

b'\xc3\xa9'
\xc3\xa9
\xc3\xa9

Just a little background to this.

The repr() function will call the test.__repr__() method of the bytes object test. And the str() function will call the test.__str__() method of the bytes object test, if __str__() is defined, else it will call the __repr__() method instead.

This can easily be seen, consider this code:

class MyClass(object):

    def __init__(self):
        pass

    def __repr__(self):
        return 'repr'

    def __str__(self):
        return 'str'

m = MyClass()
print(str(m))
print(repr(m))

output:

str
repr

if there is no .__str__() defined, consider the following code:

class MyClass(object):

    def __init__(self):
        pass

    def __repr__(self):
        return 'repr'

    #def __str__(self):
    #    return 'str'

m = MyClass()
print(str(m))
print(repr(m))

output:

repr
repr

More information about __str__() and __repr__() can be found in the Datamodel documentation

UTF-8 to byte representation as a string in Python

Answers (1)

Related Questions