Reputation: 600
I would like to understand how readline()
takes in a single line from a text file. The specific details I would like to know about, with respect to how the compiler interprets the Python language and how this is handled by the CPU, are:
readline()
know which line of text to read, given that successive calls to readline()
read the text line by line?I am a "beginner" (I have about 4 years of "simpler" programming experience), so I wouldn't be able to understand technical details, but feel free to expand if it could help others understand!
Upvotes: 5
Views: 5898
Reputation: 4806
Example using the file file.txt
:
fake file
with some text
in a few lines
Question 1: How does the readline() know which line of text to read, given that successive calls to readline() read the text line by line?
When you open a file in python, it creates a file
object. File objects act as file descriptors, which means at any one point in time, they point to a specific place in the file. When you first open the file, that pointer is at the beginning of the file. When you call readline()
, it moves the pointer forward to the character just after the next newline it reads.
Calling the tell()
function of a file
object returns the location the file descriptor is currently pointing to.
with open('file.txt', 'r') as fd:
print fd.tell()
fd.readline()
print fd.tell()
# output:
0
10
# Or 11, depending on the line separators in the file
Question 2: Is there a way to start reading a line of text from the middle of a text? How would this work with respect to the CPU?
First off, reading a file doesn't really have anything to do with the CPU. It has to do with the operating system and the file system. Both of those determine how files can be read and written to. Barebones explanation of files
For random access in files, you can use the mmap
module of python. The Python Module of the Week site has a great tutorial.
Example, jumping to the 2nd line in the example file and reading until the end:
import mmap
import contextlib
with open('file.txt', 'r') as fd:
with contextlib.closing(mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)) as mm:
print mm[10:]
# output:
with some text
in a few lines
Upvotes: 5
Reputation: 4139
This is a very broad question and it's unlikely that all details about what the CPU does would fit in an answer. But a high-level answer is possible:
readline
reads each line in order. It starts by reading chunks of the file from the beginning. When it encounters a line break, it returns that line. Each successive invocation of readline
returns the next line until the last line has been read. Then it returns an empty string.
with open("myfile.txt") as f:
while True:
line = f.readline()
if not line:
break
# do something with the line
Readline uses operating system calls under the hood. The file object corresponds to a file descriptor in the OS, and it has a pointer that keeps track of where in the file we are at the moment. The next read will return the next chunk of data from the file from that point on.
You would have to scan through the file first in order to know how many lines there are, and then use some way of starting from the "middle" line. If you meant some arbitrary line except the first and last lines, you would have to scan the file from the beginning identifying lines (for example, you could repeatedly call readline
, throwing away the result), until you have reached the line you want). There is a ready-made module for this: linecache
.
import linecache
linecache.getline("myfile.txt", 5) # we already know we want line 5
Upvotes: 2