Rckt
Rckt

Reputation: 185

Reading non-ASCII characters from a text file

I'm using python 2.7. I've tried many things like codecs but didn't work. How can I fix this.

myfile.txt

wörd

My code

f = open('myfile.txt','r')
for line in f:
    print line
f.close()

Output

s\xc3\xb6zc\xc3\xbck

Output is same on eclipse and command window. I'm using Win7. There is no problem with any characters when I don't read from a file.

Upvotes: 6

Views: 27763

Answers (3)

lavrton
lavrton

Reputation: 20308

  1. First of all - detect the file's encoding

  from chardet import detect
  encoding = lambda x: detect(x)['encoding']
  print encoding(line)
  1. then - convert it to unicode or your default encoding str:

  n_line=unicode(line,encoding(line),errors='ignore')
  print n_line
  print n_line.encode('utf8')

Upvotes: 7

Biruk Demelash
Biruk Demelash

Reputation: 161

import codecs
#open it with utf-8 encoding 
f=codecs.open("myfile.txt","r",encoding='utf-8')
#read the file to unicode string
sfile=f.read()

#check the encoding type
print type(file) #it's unicode

#unicode should be encoded to standard string to display it properly
print sfile.encode('utf-8')
#check the type of encoded string

print type(sfile.encode('utf-8'))

Upvotes: 16

jgomo3
jgomo3

Reputation: 1223

It's the terminal encoding. Try to configure your terminal with the same encoding you are using in your file. I recomend you to use UTF-8.

By the way, is a good practice to decode-encode all your inputs-outputs to avoid problems:

f = open('test.txt','r')    
for line in f:
    l = unicode(line, encoding='utf-8')# decode the input                                                                                  
    print l.encode('utf-8') # encode the output                                                                                            
f.close()

Upvotes: 1

Related Questions