Reputation: 235
I have a word êtes
in two files and I tried converting it into different formats.
1) I opened the file with codecs.open('test1.txt',encoding='ISO-8859-2')
and then did word.encode('utf-8')
. The word read as \xc4\x99tes
2) I opened another file with the same word, but with codecs.open('test2.txt',encoding='utf-8')
. This time the word read as \xeates
Shouldn't both be giving the same output??
Upvotes: 0
Views: 143
Reputation: 308530
No they should not give the same output. The first will be a byte string, and the second will be a Unicode string.
It appears your first file is encoded with ISO-8859-1
, not ISO-8859-2
. The ê
(\xea
) is being translated into ę
(\u0119
) instead, and its UTF-8 representation is the two bytes \xc4\x99
.
The second file appears to be properly encoded in UTF-8. If you want to see the actual character rather than its hex representation you need to print
it.
Upvotes: 1