Reputation: 678
I've encountered two versions of code that both can accomplish the same task with a little difference in the code itself:
with open("file") as f:
for line in f:
print line
and
with open("file") as f:
data = f.readlines()
for line in data:
print line
My question is, is the file object f
a list by default just like data
? If not, why does the first chunk of code work? Which version is the better practice?
Upvotes: 9
Views: 276
Reputation: 104015
In both cases, you are getting a file line-by-line. The method is different.
With your first version:
with open("file") as f:
for line in f:
print line
While you are interating over the file line by line, the file contents are not resident fully in memory (unless it is a 1 line file).
The open built-in function returns a file object -- not a list. That object supports iteration; in this case returning individual strings that are each group of characters in the file terminated by either a carriage return or the end of file.
You can write a loop that is similar to what for line in f: print line
is doing under the hood:
with open('file') as f:
while True:
try:
line=f.next()
except StopIteration:
break
else:
print line
With the second version:
with open("file") as f:
data = f.readlines() # equivelent to data=list(f)
for line in data:
print line
You are using a method of a file object (file.readlines()) that reads the entire file contents into memory as a list of the individual lines. The code is then iterating over that list.
You can write a similar version of that as well that highlights the iterators under the hood:
with open('file') as f:
data=list(f)
it=iter(data)
while True:
try:
line=it.next()
except StopIteration:
break
else:
print line
In both of your examples, you are using a for loop to loop over items in a sequence. The items are the same in each case (individual lines of the file) but the underlying sequence is different. In the first version, the sequence is a file object; in the second version it is a list. Use the first version if you just want to deal with each line. Use the second if you want a list of lines.
Read Ned Batchelder's excellent overview on looping and iteration for more.
Upvotes: 7
Reputation: 15864
File
object is not a list
- it's an object that conforms to iterator interface (docs). I.e. it implements __iter__
method that returns an iterator object. That iterator object implements both __iter__
and next
methods allowing iteration over the collection.
It happens that the File
object is it's own iterator (docs) meaning file.__iter__()
returns self
.
Both for line in file
and lines = file.readlines()
are equivalent in that they yield the same result if used to get/iterator over all lines in the file. But, file.next()
buffers the contents from the file (it reads ahead) to speed up the process of reading, effectively moving the file descriptor to position exact to or farther than where the last line ended. This means that if you have used for line in file
, read some lines and the stopped the iteration (you haven't reach end of the file) and now called file.readlines()
, the first line returned might not be the full line following the last line iterated over the for
loop.
When you use for x in my_it
, the interpreter calls my_it.__iter__()
. Now, the next()
method is being called on the object returned by the previous call, and for each call it's return value is being assigned to x
. When next()
raises StopIteration
, the loop ends.
Note: A valid iterator implementation should ensure that once StopIteration
is raised, it should remain to be risen for all subsequent calls to next()
.
Upvotes: 11
Reputation: 49856
A file is an iterable. Lots of objects, including lists are iterable, which just means that they can be used in a for loop to sequentially yield an object to bind the for iterator variable to.
Both versions of your code accomplish iteration line by line. The second versions reads the whole file into memory and constructs a list; the first may not read the whole file first. The reason why you might prefer the second is that you want to close the file before something else modifies it; the first might be preferred if the file is very large.
Upvotes: 0