Reputation: 1224
I have the following Python 3 script:
from sys import argv
script, filename = argv
txt = open(filename)
print(f"Here's your file {filename}:")
print(txt.read())
When we use the built in function open()
we open the file and return a corresponding file object.
I understand that read()
is not a built-in function, but a method of file object.
As stated here in the Python docs about file objects https://docs.python.org/3/glossary.html#term-file-object:
There are actually three categories of file objects: raw binary files, buffered binary files and text files. Their interfaces are defined in the io module.
I'm really struggling to understand a few key areas.
1) How do I know which file object type I will be working with of raw binary, buffered binary and text files? In this example I am using a simple .txt file, so I would assume the file object would be a text file.
2) How do I know which specific read()
method I am calling when I use the io module? Which class is it part of, as multiple classes have the read method available
Please keep answers as simple as possible as I'm fairly new to Python. I just don't understand the documentation for the io module very well. I quickly become lost from step 3 onwards and need this explaining to me in simple steps.
I'm making a real effort to understand the logical steps to navigate the documentation, so please amend these steps as appropriate.
My understanding is as follows:
open()
function io
module to work with the file object.TextIOBase
.'io.TextIOBase
is used which has various methods such as read()
available.Upvotes: 0
Views: 704
Reputation: 77902
In this example I am using a simple .txt file, so I would assume the file object would be a text file."
This is totally unrelated.
The extension is only a naming convention. It has absolutely nothing to do with the effective content - which from a purely technical POV is always made of bytes anyway (the difference is about how you interpret those bytes) -, and it has nothing to do with which IO class open()
will use either, cf deceze's complete and excellent answer.
Upvotes: 0
Reputation: 9130
How do I know which file object type I will be working with of raw binary, buffered binary and text files? In this example I am using a simple .txt file, so I would assume the file object would be a text file.
You don’t. But there are ways to identify/guess a file’s content type quite similar to Linux’s file
command. For example, take a look at the python-magic package:
import magic
m = magic.Magic(mime=True)
print(m.from_file(filename))
This would give you the MIME type of a file, e.g. application/json
and then you’d know whether to read it as a text or binary file.
Whether you’re reading the text or binary file buffered or not, depends on how you open it, see also the io module.
The other answers provide more details on the IO, so I’m not going into this here… 😉
Upvotes: 0
Reputation: 884
It is all about how you open the file.
If you call open(path)
, you will open path
as a text file object. If you call open(path, 'rb')
, you will open as a buffered binary. If you call open(path, 'rb', buffering=0)
, you will open as a unbuffered binary. Simple as that =)
Please refer to https://docs.python.org/3/library/io.html for more information.
Upvotes: 0
Reputation: 522075
There are certain things which are identical between any file object, and you can see that in the class hierarchy. All of the file objects have IOBase
as their base class, which defines things which are common to all file objects. It then specialises into RawIOBase
, BufferedIOBase
and TextIOBase
classes, which then further specialise into FileIO
and BytesIO
and whatnot. It's a typical OOP class hierarchy.
What they all have in common is that they all define a read
method. What that method does differs slightly in the details, but the overall function is the same: it reads from whatever the underlying data is and returns that data. That's typical OOP abstraction/encapsulation/polymorphism: you don't need to care how it does it or what exactly it does, you just need to know that you call .read()
to get data.
You could instantiate those classes individually, but you typically use open
to simplify that potentially complex task. open
decides which class to return to you based on what exactly you requested:
Text I/O
Text I/O expects and produces
str
objects. This means that whenever the backing store is natively made of bytes (such as in the case of a file), encoding and decoding of data is made transparently as well as optional translation of platform-specific newline characters.The easiest way to create a text stream is with
open()
, optionally specifying an encoding:f = open("myfile.txt", "r", encoding="utf-8")
Binary I/O
Binary I/O (also called buffered I/O) expects
bytes
-like objects and producesbytes
objects. No encoding, decoding, or newline translation is performed. [...]The easiest way to create a binary stream is with
open()
with'b'
in themode
string:f = open("myfile.jpg", "rb")
Raw I/O
Raw I/O (also called unbuffered I/O) is generally used as a low-level building-block for binary and text streams; it is rarely useful to directly manipulate a raw stream from user code. Nevertheless, you can create a raw stream by opening a file in binary mode with
buffering
disabled:f = open("myfile.jpg", "rb", buffering=0)
Upvotes: 3