xczzhh
xczzhh

Reputation: 678

Is an object file a list by default?

I've encountered two versions of code that both can accomplish the same task with a little difference in the code itself:

with open("file") as f:
   for line in f:
     print line

and

with open("file") as f:
   data = f.readlines() 
   for line in data:
     print line 

My question is, is the file object f a list by default just like data? If not, why does the first chunk of code work? Which version is the better practice?

Upvotes: 9

Views: 276

Answers (4)

dawg
dawg

Reputation: 104015

In both cases, you are getting a file line-by-line. The method is different.

With your first version:

with open("file") as f:
   for line in f:
     print line

While you are interating over the file line by line, the file contents are not resident fully in memory (unless it is a 1 line file).

The open built-in function returns a file object -- not a list. That object supports iteration; in this case returning individual strings that are each group of characters in the file terminated by either a carriage return or the end of file.

You can write a loop that is similar to what for line in f: print line is doing under the hood:

with open('file') as f:
    while True:
        try:
            line=f.next()
        except StopIteration:
            break
        else:
            print line 

With the second version:

with open("file") as f:
   data = f.readlines()    # equivelent to data=list(f)
   for line in data:
     print line

You are using a method of a file object (file.readlines()) that reads the entire file contents into memory as a list of the individual lines. The code is then iterating over that list.

You can write a similar version of that as well that highlights the iterators under the hood:

with open('file') as f:
    data=list(f)
    it=iter(data)
    while True:
        try:
            line=it.next()
        except StopIteration:
            break
        else:
            print line  

In both of your examples, you are using a for loop to loop over items in a sequence. The items are the same in each case (individual lines of the file) but the underlying sequence is different. In the first version, the sequence is a file object; in the second version it is a list. Use the first version if you just want to deal with each line. Use the second if you want a list of lines.

Read Ned Batchelder's excellent overview on looping and iteration for more.

Upvotes: 7

Maciej Gol
Maciej Gol

Reputation: 15864

File object is not a list - it's an object that conforms to iterator interface (docs). I.e. it implements __iter__ method that returns an iterator object. That iterator object implements both __iter__ and next methods allowing iteration over the collection.

It happens that the File object is it's own iterator (docs) meaning file.__iter__() returns self.

Both for line in file and lines = file.readlines() are equivalent in that they yield the same result if used to get/iterator over all lines in the file. But, file.next() buffers the contents from the file (it reads ahead) to speed up the process of reading, effectively moving the file descriptor to position exact to or farther than where the last line ended. This means that if you have used for line in file, read some lines and the stopped the iteration (you haven't reach end of the file) and now called file.readlines(), the first line returned might not be the full line following the last line iterated over the for loop.

When you use for x in my_it, the interpreter calls my_it.__iter__(). Now, the next() method is being called on the object returned by the previous call, and for each call it's return value is being assigned to x. When next() raises StopIteration, the loop ends.

Note: A valid iterator implementation should ensure that once StopIteration is raised, it should remain to be risen for all subsequent calls to next().

Upvotes: 11

Marcin
Marcin

Reputation: 49856

A file is an iterable. Lots of objects, including lists are iterable, which just means that they can be used in a for loop to sequentially yield an object to bind the for iterator variable to.

Both versions of your code accomplish iteration line by line. The second versions reads the whole file into memory and constructs a list; the first may not read the whole file first. The reason why you might prefer the second is that you want to close the file before something else modifies it; the first might be preferred if the file is very large.

Upvotes: 0

jgritty
jgritty

Reputation: 11935

f is a filehandle, not a list. It is iterable.

Upvotes: 4

Related Questions