frazman
frazman

Reputation: 33273

Weird characters in string

I am reading some data from file..

But there are some weird characters I am observing like;

'tamb\xc3\xa9m', 'f\xc3\xbcr','cari\xc3\xb1o'

My file read code is fairly standard:

 with open(filename) as f:
    for line in f:
        print line

Upvotes: 3

Views: 11075

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123450

You have UTF-8 encoded data. You could decode the data:

with open(filename) as f:
   for line in f:
       print line.decode('utf8')

or use io.open() to have Python decode the contents for you, as you read:

import io

with io.open(filename, encoding='utf8') as f:
   for line in f:
       print line

Your data, decoded:

>>> print 'tamb\xc3\xa9m'.decode('utf8')
também
>>> print 'f\xc3\xbcr'.decode('utf8')
für
>>> print 'cari\xc3\xb1o'.decode('utf8')
cariño

You appear to have printed string representations, (the output of the repr() function), which produces string literal syntax suitable for pasting back into your Python interpreter. \xhh hex codes are used for characters outside of the printable ASCII range. Python containers such as list or dict also use repr() to show their contents, when printed.

You may want to read up on Unicode, and how it interacts with Python. See:

Upvotes: 11

Related Questions