m3div0
m3div0

Reputation: 1586

Python convert UTF8 file to CP1250

I have a UTF-8 encoded file which I need to save as a CP1250 encoded file. So I did the following

import codecs
# Read file as UTF-8
with codecs.open("utf.htm", "r", 'utf-8') as sourceFile:
    # Write file as CP1250
    with codecs.open('win1250.htm', "w", "cp1250", "xmlcharrefreplace") as targetFile:
        while True:
            contents = sourceFile.read()
            if not contents:
                break
            targetFile.write(contents)

When I inspect the unicode string contents in my editor, all the characters seems to be fine. But when I open the final file in notepad, the file is not written correctly. For instance, instead of letter ř I get symbol ø. Any ideas what is going wrong here?

Thanks

Upvotes: 1

Views: 5908

Answers (1)

Blender
Blender

Reputation: 298046

Notepad probably thinks the file holds text encoded with CP-1252:

>>> 'ř'.encode('cp1250').decode('cp1250')
'ř'
>>> 'ř'.encode('cp1250').decode('cp1252')
'ø'

This is a problem with Notepad. Use a text editor where you can specify the encoding manually, like Notepad++.

Upvotes: 2

Related Questions