Python keeps showing cp1250 character encoding in files

Question

I have excersise to make script which convert UTF-16 files to UTF-8, so I wanted to have one example file with UTF-16 coding. The problem is that all files encoding which Python shows me is 'cp1250'(no matter which format .csv or .txt). What am I missing here? I have also example files from the Internet, but Python recognize them as cp-1250. Even when I save file with UTF-8, Python shows cp-1250 coding.

This is the code I use:

 with open('FILE') as f:
     print(f.encoding)

tripleee · Accepted Answer

The result from open simply is a file in your system's default encoding. To open it in something else, you have to specifically say so.

To actually convert a file, try something like

with open('input', encoding='cp1252') as input, open('output', 'w', encoding='utf-16le') as output:
    for line in input:
        output.write(line)

Converting a legacy 8-bit file to Unicode isn't really useful because it only exercises a small subset of the character set. See if you can find a good "hello world" sample file. https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html is one for UTF-8.

Python keeps showing cp1250 character encoding in files

Answers (1)

Related Questions