Reputation:
I am reading a text file containing a single word B\xc3\xa9zier
.
I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier
and print it to console.
My code is as follows:
foo=open("test.txt")
for line in foo.readlines():
for word in line.split():
print(word.decode('utf-8'))
foo.close()
the output is:
B\xc3\xa9zier
However if i do something like this:
>>> print('B\xc3\xa9zier'.decode('utf-8'))
I get the correct output:
Bézier
I am unable to figure out why this is happening?
Upvotes: 1
Views: 149
Reputation: 133504
It seems as though you have a raw utf8 escaped string in the file, use string_escape
to decode it instead
with open('test.txt') as f:
for line in f:
for word in line.split():
print(word.decode('string_escape').decode('utf-8'))
Bézier
Upvotes: 6