Reputation: 1586
I have a UTF-8 encoded file which I need to save as a CP1250 encoded file. So I did the following
import codecs
# Read file as UTF-8
with codecs.open("utf.htm", "r", 'utf-8') as sourceFile:
# Write file as CP1250
with codecs.open('win1250.htm', "w", "cp1250", "xmlcharrefreplace") as targetFile:
while True:
contents = sourceFile.read()
if not contents:
break
targetFile.write(contents)
When I inspect the unicode string contents
in my editor, all the characters seems to be fine. But when I open the final file in notepad, the file is not written correctly. For instance, instead of letter ř
I get symbol ø
. Any ideas what is going wrong here?
Thanks
Upvotes: 1
Views: 5908
Reputation: 298046
Notepad probably thinks the file holds text encoded with CP-1252:
>>> 'ř'.encode('cp1250').decode('cp1250')
'ř'
>>> 'ř'.encode('cp1250').decode('cp1252')
'ø'
This is a problem with Notepad. Use a text editor where you can specify the encoding manually, like Notepad++.
Upvotes: 2