Reputation: 713

Python file variable - what is it?

I just started with Python, and since my background is in more low-level languages (java, C++), i just cant really get some things.

So, in python one can create a file variable, by opening a text file, for example, and then iterate through its lines like this:

f = open(sys.argv[1])
for line in f:
    #do something

However, if i try f[0] the interpreter gives an error. So what structure does f object have and how do i know in general, if i can apply for ... in ... : loop to an object?

Upvotes: 5

Answers (10)

7stud

Reputation: 48609

1) f is not a list. Is there any book, tutorial, or website that told you f is a list? If not, why do you think you can treat f as a list? You certainly can't treat a file in C++ or Java as an array can you? Why not?

2) In python, a for loop does the following things:

a) The for loop calls __iter__() on the object to the right of 'in', 
   e.g. f.__iter__(). 

b) The for loop repeatedly calls next() (or __next__() in python 3) on whatever 
   f.__iter__() returns.

So f.__iter__() can return an object to do whatever it wants when next() is called on it. It just so happens that Guido decided that the object returned by a f.__iter__() should provide lines from the file when its next() method is called.

how do i know in general, if i can apply for ... in ... : loop to an object?

If the object has an __iter__() method, and the __iter__() method returns an object with a next() method, you can apply a for-in loop to it. Or in other words, you learn from experience which objects implement the iterator protocol.

Upvotes: 4

Blender

Reputation: 298364

f is a file object. The documentation lists its structure, so I'll only explain a the indexing/iterating behavior.

An object is indexable only if it implements __getitem__, which you can check by calling hasattr(f, '__getitem__') or just calling f[0] and seeing if it throws an error. In fact, that's exactly what your error message tells you:

TypeError: 'file' object has no attribute '__getitem__'

File objects are not indexable. You can call f.readlines() and return a list of lines, which itself is indexable.

Objects that implement __iter__ are iterable with the for ... in ... syntax. Now there are actually two types of iterable objects: container objects and iterator objects. Iterator objects implement two methods: __iter__ and __next__. Container objects implement only __iter__ and return an iterator object, which is actually what you're iterating over. File objects are their own iterators, as they implement both methods.

If you want to get the next item in an iterable, you can use the next() function:

first_line = next(f)
second_line = next(f)
next_line_that_starts_with_0 = next(line for line in f if line.startswith('0'))

One word of caution: iterables generally aren't "rewindable", so once you progress through the iterable, you can't really go back. To "rewind" a file object, you can use f.seek(0), which will set the current position back to the beginning of the file.

Upvotes: 9

octopusgrabbus

Reputation: 10695

Once you create f, it's a file object. readlines is one of the file object's methods. The

for line in f.readlines():

starts a loop that allows other code you write to process line of the file at a time. You can use the for loop, because the object returned from readlines() is iterable.

Upvotes: 0

steveha

Reputation: 76725

In Python, every data item is a Python object. So whatever is returned to you by open() is an object; specifically, it is a file object, which represents a file handle.

You already know how to do this:

handle = open("some_file.txt, "r")

This is, conceptually, very similar to the equivalent in C:

FILE *handle;

handle = fopen("some_file.txt", "r");

In C, the only useful thing you can do with that handle variable is to pass it to calls like fread(). In Python, the object has method functions associated with it. So, here is C to read 100 bytes from a file and then close it:

FILE *handle;

handle = fopen("some_file.txt", "r");
result = fread(buffer, 1, 100 handle);  // read 100 bytes into buffer
fclose(handle);

And here is equivalent Python:

handle = open("some_file.txt", "r");
handle.read(100)
handle.close()

A good way to find out more about Python functions and objects is to use the built-in help() command from the Python prompt. Try help(open) and it doesn't tell you much, but does tell you that it returns a file object. So then try help(file) and now you get a whole lot of information. You can read about the .close() method, .read(), and others such as .readlines().

But the one that confused you was iterating the handle object. Since a very common case is reading lines from a file, Python makes file handles work as an iterator, and when you iterate you get one line at a time from the file.

List objects in Python are both indexable and iterable, so if you have a list named a you can both do a[i] or for x in a:. Looking up an item by position, a[i], is indexing. File handle objects do not support indexing but do support iteration.

In several answers here you will see the with statement. This is best practice in Python. A with statement only works with some kinds of objects in Python; the objects have to support a couple of special method functions. All you really need to know right now about with is that when you can use it, some needed initialization and finalization work can be done for you. In the case of opening a file, the with statement will take care of closing the file for you. The great part is that the with statement guarantees that the finalization will be done even if the code raises an exception.

Here's idiomatic Python for the above example:

with open("some_file.txt") as handle:
    buffer = handle.read(100)

Upvotes: 2

Cairnarvon

Reputation: 27822

This demonstrates the difference between a sequence type, which supports indexing, slicing, and limited iteration, and an iterator type, which doesn't support indexing or slicing, but more advanced iteration, maintaining internal state to do it.

A file object is an example of the latter. You can extract the contents as lines and store them in a sequence type (specifically, a list) through the readlines method, as others have pointed out.

Upvotes: 2

Sukrit Kalra

Reputation: 34513

The Interpreter gives an error

TypeError: 'file' object has no attribute '__getitem__'

which tells you that the type file does not allow indexing like f[0] and so on. If a type has the attribute, __getitem__, it allows indexing, otherwise it does not. In the case of files, it is the latter.

You can know more about files by doing.

>>> fileTest = open('fileName')
>>> type(fileTest)
<type 'file'>
>>> dir(fileTest)
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

for loops can generally be applied to any structure which is iterable.

If you want a list of lines, then you can do.

>>> with open('fileName') as f:
         lines = f.readlines()

Or by doing,

>>> with open('fileName') as f:
         lines = [line for line in f]

Upvotes: 0

Rajeev

Reputation: 46949

You can always use dir(f) to see the structure of f ,f is a file object

 ['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']

Upvotes: 0

shadyabhi

Reputation: 17234

The reason you are able to do this is because file object is a iterable.

Upvotes: 1

Jakub M.

Reputation: 33857

File variable is something like file handler in C. You open it, operate on it (read, write) and close in the end.

handler.read() # read all file content at once

handler.write(blob) # write there something

handler.readlines() # read list with lines

for line in handler:
    print line # iterate lines nicely

The last example is better than for line in handler.readlines() , because first one read lines when you need them, and the second one consumes all the lines at once (can be trouble with large files)

Upvotes: 0

Bryan

Reputation: 6752

What you're looking for is readlines http://docs.python.org/2/library/stdtypes.html#file.readlines

file_lines = f.readlines()

for line in file_lines:
    print line

print file_lines[0] # You can access an element by index

Upvotes: 1

Python file variable - what is it?

Answers (10)

Related Questions