user2374515
user2374515

Reputation:

Python encoding issue in reading from text file

I am reading a text file containing a single word B\xc3\xa9zier.

I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier and print it to console.

My code is as follows:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

the output is:

B\xc3\xa9zier

However if i do something like this:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

I get the correct output:

Bézier

I am unable to figure out why this is happening?

Upvotes: 1

Views: 149

Answers (1)

jamylak
jamylak

Reputation: 133504

It seems as though you have a raw utf8 escaped string in the file, use string_escape to decode it instead

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier

Upvotes: 6

Related Questions