Determine the type of the result of `file.read()` from `file` in Python

Question

I have some code that operates on a file object in Python.

Following Python3's string/bytes revolution, if file was opened in binary mode, file.read() returns bytes. Conversely if file was opened in text mode, file.read() returns str.

In my code, file.read() is called multiple times and therefore it is not practical to check for the result-type every time I call file.read(), e.g.:

def foo(file_obj):
    while True:
        data = file.read(1)
        if not data:
            break
        if isinstance(data, bytes):
            # do something for bytes
            ...
        else:  # isinstance(data, str)
            # do something for str
            ...

What I would like to have instead is some ways of reliably checking what the result of file.read() will be, e.g.:

def foo(file_obj):
    if is_binary_file(file_obj):
        # do something for bytes
        while True:
            data = file.read(1)
            if not data:
                break
            ...
    else:
        # do something for str
        while True:
            data = file.read(1)
            if not data:
                break
            ...

A possible way would be to check file_obj.mode e.g.:

import io


def is_binary_file(file_obj):
    return 'b' in file_obj.mode


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# AttributeError: '_io.StringIO' object has no attribute 'mode'
print(is_binary_file(io.BytesIO(b'ciao')))
# AttributeError: '_io.BytesIO' object has no attribute 'mode'

which would fail for the objects from io like io.StringIO() and io.BytesIO().

Another way, which would also work for io objects, would be to check for the encoding attribute, e.g:

import io


def is_binary_file(file_obj):
    return not hasattr(file_obj, 'encoding')


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# False 
print(is_binary_file(io.BytesIO(b'ciao')))
# True

Is there a cleaner way to perform this check?

norok2 · Accepted Answer

After a bit more homework, I can probably answer my own question.

First of all, a general remark: checking for the presence/absence of an attribute/method as a hallmark for the whole API is not a good idea because it will lead to more complex and still relatively unsafe code.

Following the EAFP/duck-typing mindset it may be OK to check for a specific method, but it should be the one used subsequently in the code.

The problem with file.read() (and even more so with file.write()) is that it comes with side-effects that make it unpractical to just try using it and see what happens.

For this specific case, while still following the duck-typing mindset, one could exploit the fact that the first parameter of read() can be set to 0. This will not actually read anything from the buffer (and it will not change the result of file.tell()), but it will give an empty str or bytes. Hence, one could write something like:

def is_reading_bytes(file_obj):
    return isinstance(file_obj.read(0), bytes)


print(is_reading_bytes(open('test_file', 'r')))
# False
print(is_reading_bytes(open('test_file', 'rb')))
# True
print(is_reading_bytes(io.StringIO('ciao')))
# False 
print(is_reading_bytes(io.BytesIO(b'ciao')))
# True

Similarly, one could try writing an empty bytes string b'' for the write() method:

def is_writing_bytes(file_obj)
    try:
        file_obj.write(b'')
    except TypeError:
        return False
    else:
        return True


print(is_writing_bytes(open('test_file', 'w')))
# False
print(is_writing_bytes(open('test_file', 'wb')))
# True
print(is_writing_bytes(io.StringIO('ciao')))
# False 
print(is_writing_bytes(io.BytesIO(b'ciao')))
# True

Note that those methods will not check for readability / writability.

Finally, one could implement a proper type checking approach by inspecting the the file-like object API. A file-like object in Python must support the API described in the io module. In the documentation it is mentioned that TextIOBase is used for files opened in text mode, while BufferedIOBase (or RawIOBase for unbuffered streams) is used for files opened in binary mode. The class hierarchy summary indicates that are both subclassed from IOBase. Hence the following will do the trick (remember that isinstance() checks for subclasses too):

def is_binary_file(file_obj):
    return isinstance(file_obj, io.IOBase) and not isinstance(file_obj, io.TextIOBase)


print(is_binary_file(open('test_file', 'w')))
# False
print(is_binary_file(open('test_file', 'wb')))
# True
print(is_binary_file(open('test_file', 'r')))
# False
print(is_binary_file(open('test_file', 'rb')))
# True
print(is_binary_file(io.StringIO('ciao')))
# False 
print(is_binary_file(io.BytesIO(b'ciao')))
# True

Note that the documentation explicitly says that TextIOBase will have a encoding parameter, which is not required (i.e. it is not there) for binary file objects. Hence, with the current API, checking on the encoding attribute may be a handy hack to check if the file object is binary for standard classes, under the assumption the the tested object is file-like. Checking the mode attribute would only work for FileIO objects and the mode attribute is not part of the IOBase / RawIOBase interface, and that is why it does not work on io.StringIO() / is.BytesIO() objects.

Determine the type of the result of `file.read()` from `file` in Python

Answers (2)

Related Questions