LazyCat
LazyCat

Reputation: 506

properly converting special chars in python byte string

Tried to look through a few similar threads, but still confused:

I have a byte string with some special characters (for a double quote in my case) like below. What's the easiest way to properly convert it to a string, so that the special characters are mapped correctly?

b = b'My groovy str\xe2\x80\x9d is now fixed'

Update: regarding decode('utf-8')

>>> b = b'My groovy str\xe2\x80\x9d is now fixed'
>>> b_converted = b.decode("utf-8") 
>>> b_converted
'My groovy str\u201d is now fixed'
>>> print(b_converted)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 13: ordinal not in range(128)

Upvotes: 0

Views: 3453

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177891

Use .decode(encoding) on a byte string to convert it to Unicode.

Encoding can not always be determined and depends on the source. In this case it is clearly utf8.

Ideally when reading text strings the API used to read the data can specify the encoding or in the case of website requests detect it from response headers, so you don't need to .decode explicitly, for example:

with open('input.txt',encoding='utf8') as file:
    text = file.read()

or

import requests
response = requests.get('http://example.com')
print(response.encoding)
print(response.text) # translated from encoding

Upvotes: 2

David Duran
David Duran

Reputation: 1826

The following should work:

b_converted = b.decode("utf-8") 

Converted from:

b'My groovy str\xe2\x80\x9d is now fixed'

To:

My groovy str” is now fixed

Upvotes: 2

Related Questions