Todd Shannon
Todd Shannon

Reputation: 567

How to read binary file to text in python 3.9

I have a .sql files that I want to read into my python session (python 3.9). I'm opening using the file context manager.

with open('file.sql', 'r') as f:
    text = f.read()

When I print the text, I still get the binary characters, i.e., \xff\xfe\r\x00\n\x00-\x00-..., etc.

I've tried all the arguments such as 'rb', encoding='utf-8, etc., but the results are still binary text. It should be noted that I've used this very same procedure many times over in my code before and this has not been a problem.

Did something change in python 3.9?

Upvotes: 1

Views: 1813

Answers (1)

furas
furas

Reputation: 142671

First two bytes \xff\xfe looks like BOM (Byte Order Mark)
and table at Wikipedia page BOM shows that \xff\xfe can means encoding UTF-16-LE

So you could try

with open('file.sql', 'r', encoding='utf-16-le') as f:

EDIT:

There is module chardet which you may also try to use to detect encoding.

import chardet 

with open('file.sql', 'rb') as f:  # read bytes
     data = f.read()

info = chardet.detect(data)
print(info['encoding'])

text = data.decode(info['encoding'])

Usually files don't have BOM but if they have then you may try to detect it using example from unicodebook.readthedocs.io/guess_encoding/check-for-bom-markers

from codecs import BOM_UTF8, BOM_UTF16_BE, BOM_UTF16_LE, BOM_UTF32_BE, BOM_UTF32_LE

BOMS = (
    (BOM_UTF8, "UTF-8"),
    (BOM_UTF32_BE, "UTF-32-BE"),
    (BOM_UTF32_LE, "UTF-32-LE"),
    (BOM_UTF16_BE, "UTF-16-BE"),
    (BOM_UTF16_LE, "UTF-16-LE"),
)

def check_bom(data):
    return [encoding for bom, encoding in BOMS if data.startswith(bom)]

# ---------

with open('file.sql', 'rb') as f:  # read bytes
     data = f.read()

encoding = check_bom(data)
print(encoding)     

if encoding:
    text = data.decode(encoding[0])
else:
    print('unknown encoding')

Upvotes: 1

Related Questions