Reputation: 506
Tried to look through a few similar threads, but still confused:
I have a byte string with some special characters (for a double quote in my case) like below. What's the easiest way to properly convert it to a string, so that the special characters are mapped correctly?
b = b'My groovy str\xe2\x80\x9d is now fixed'
Update: regarding decode('utf-8')
>>> b = b'My groovy str\xe2\x80\x9d is now fixed'
>>> b_converted = b.decode("utf-8")
>>> b_converted
'My groovy str\u201d is now fixed'
>>> print(b_converted)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 13: ordinal not in range(128)
Upvotes: 0
Views: 3453
Reputation: 177891
Use .decode(encoding)
on a byte string to convert it to Unicode.
Encoding can not always be determined and depends on the source. In this case it is clearly utf8
.
Ideally when reading text strings the API used to read the data can specify the encoding or in the case of website requests detect it from response headers, so you don't need to .decode
explicitly, for example:
with open('input.txt',encoding='utf8') as file:
text = file.read()
or
import requests
response = requests.get('http://example.com')
print(response.encoding)
print(response.text) # translated from encoding
Upvotes: 2
Reputation: 1826
The following should work:
b_converted = b.decode("utf-8")
Converted from:
b'My groovy str\xe2\x80\x9d is now fixed'
To:
My groovy str” is now fixed
Upvotes: 2