Reputation: 105
I'm attempting to do something very simple, which is read a file in ascii or utf-8-sig and save it as utf-8. However, when I run the function below, and then do file filename.json
in linux, it always shows the file as being ASCII. I have tried using codecs, and no luck either. The only way I can get it to work, is if I replace utf-8
with utf-8-sig
, BUT that gives me the issue that the file has BOM endings. I've searched around for solutions, and I found some removing the beginning characters, however, after this is performed, the file becomes ascii again. I have tried everything her: Convert UTF-8 with BOM to UTF-8 with no BOM in Python
def file_converter(file_path):
s = open(file_path, mode='r', encoding='ascii').read()
open(file_path, mode='w', encoding='utf-8').write(s)
Upvotes: 2
Views: 868
Reputation: 239672
Files that only contain characters below U+0080 encode to exactly the same bytes as either ASCII or UTF-8 (this was one of the compatibility goals of UTF-8). file
detects the file as ASCII, and it is, but it's also UTF-8, and will decode correctly as UTF-8 (just like any ASCII file will). So nothing at all is wrong.
Upvotes: 3