JaMo
JaMo

Reputation: 105

UTF-8 Encoding in python gets transformed to ASCII?

I'm attempting to do something very simple, which is read a file in ascii or utf-8-sig and save it as utf-8. However, when I run the function below, and then do file filename.json in linux, it always shows the file as being ASCII. I have tried using codecs, and no luck either. The only way I can get it to work, is if I replace utf-8 with utf-8-sig, BUT that gives me the issue that the file has BOM endings. I've searched around for solutions, and I found some removing the beginning characters, however, after this is performed, the file becomes ascii again. I have tried everything her: Convert UTF-8 with BOM to UTF-8 with no BOM in Python

def file_converter(file_path):
    s = open(file_path, mode='r', encoding='ascii').read()
    open(file_path, mode='w', encoding='utf-8').write(s)

Upvotes: 2

Views: 868

Answers (1)

hobbs
hobbs

Reputation: 239672

Files that only contain characters below U+0080 encode to exactly the same bytes as either ASCII or UTF-8 (this was one of the compatibility goals of UTF-8). file detects the file as ASCII, and it is, but it's also UTF-8, and will decode correctly as UTF-8 (just like any ASCII file will). So nothing at all is wrong.

Upvotes: 3

Related Questions