Reputation: 713
I just started with Python, and since my background is in more low-level languages (java, C++), i just cant really get some things.
So, in python one can create a file variable, by opening a text file, for example, and then iterate through its lines like this:
f = open(sys.argv[1])
for line in f:
#do something
However, if i try f[0]
the interpreter gives an error. So what structure does f
object have and how do i know in general, if i can apply for ... in ... :
loop to an object?
Upvotes: 5
Views: 21853
Reputation: 48609
1) f is not a list. Is there any book, tutorial, or website that told you f is a list? If not, why do you think you can treat f as a list? You certainly can't treat a file in C++ or Java as an array can you? Why not?
2) In python, a for loop does the following things:
a) The for loop calls __iter__() on the object to the right of 'in',
e.g. f.__iter__().
b) The for loop repeatedly calls next() (or __next__() in python 3) on whatever
f.__iter__() returns.
So f.__iter__()
can return an object to do whatever it wants when next() is called on it. It just so happens that Guido decided that the object returned by a f.__iter__()
should provide lines from the file when its next() method is called.
how do i know in general, if i can apply for ... in ... : loop to an object?
If the object has an __iter__()
method, and the __iter__()
method returns an object with a next() method, you can apply a for-in loop to it. Or in other words, you learn from experience which objects implement the iterator protocol.
Upvotes: 4
Reputation: 298364
f
is a file object. The documentation lists its structure, so I'll only explain a the indexing/iterating behavior.
An object is indexable only if it implements __getitem__
, which you can check by calling hasattr(f, '__getitem__')
or just calling f[0]
and seeing if it throws an error. In fact, that's exactly what your error message tells you:
TypeError: 'file' object has no attribute '__getitem__'
File objects are not indexable. You can call f.readlines()
and return a list of lines, which itself is indexable.
Objects that implement __iter__
are iterable with the for ... in ...
syntax. Now there are actually two types of iterable objects: container objects and iterator objects. Iterator objects implement two methods: __iter__
and __next__
. Container objects implement only __iter__
and return an iterator object, which is actually what you're iterating over. File objects are their own iterators, as they implement both methods.
If you want to get the next item in an iterable, you can use the next()
function:
first_line = next(f)
second_line = next(f)
next_line_that_starts_with_0 = next(line for line in f if line.startswith('0'))
One word of caution: iterables generally aren't "rewindable", so once you progress through the iterable, you can't really go back. To "rewind" a file object, you can use f.seek(0)
, which will set the current position back to the beginning of the file.
Upvotes: 9
Reputation: 10695
Once you create f, it's a file object. readlines is one of the file object's methods. The
for line in f.readlines():
starts a loop that allows other code you write to process line
of the file at a time. You can use the for loop, because the object returned from readlines() is iterable.
Upvotes: 0
Reputation: 76725
In Python, every data item is a Python object. So whatever is returned to you by open()
is an object; specifically, it is a file
object, which represents a file handle.
You already know how to do this:
handle = open("some_file.txt, "r")
This is, conceptually, very similar to the equivalent in C:
FILE *handle;
handle = fopen("some_file.txt", "r");
In C, the only useful thing you can do with that handle
variable is to pass it to calls like fread()
. In Python, the object has method functions associated with it. So, here is C to read 100 bytes from a file and then close it:
FILE *handle;
handle = fopen("some_file.txt", "r");
result = fread(buffer, 1, 100 handle); // read 100 bytes into buffer
fclose(handle);
And here is equivalent Python:
handle = open("some_file.txt", "r");
handle.read(100)
handle.close()
A good way to find out more about Python functions and objects is to use the built-in help()
command from the Python prompt. Try help(open)
and it doesn't tell you much, but does tell you that it returns a file object. So then try help(file)
and now you get a whole lot of information. You can read about the .close()
method, .read()
, and others such as .readlines()
.
But the one that confused you was iterating the handle object. Since a very common case is reading lines from a file, Python makes file handles work as an iterator, and when you iterate you get one line at a time from the file.
List objects in Python are both indexable and iterable, so if you have a list named a
you can both do a[i]
or for x in a:
. Looking up an item by position, a[i]
, is indexing. File handle objects do not support indexing but do support iteration.
In several answers here you will see the with
statement. This is best practice in Python. A with
statement only works with some kinds of objects in Python; the objects have to support a couple of special method functions. All you really need to know right now about with
is that when you can use it, some needed initialization and finalization work can be done for you. In the case of opening a file, the with
statement will take care of closing the file for you. The great part is that the with
statement guarantees that the finalization will be done even if the code raises an exception.
Here's idiomatic Python for the above example:
with open("some_file.txt") as handle:
buffer = handle.read(100)
Upvotes: 2
Reputation: 27822
This demonstrates the difference between a sequence type, which supports indexing, slicing, and limited iteration, and an iterator type, which doesn't support indexing or slicing, but more advanced iteration, maintaining internal state to do it.
A file object is an example of the latter. You can extract the contents as lines and store them in a sequence type (specifically, a list) through the readlines
method, as others have pointed out.
Upvotes: 2
Reputation: 34513
The Interpreter gives an error
TypeError: 'file' object has no attribute '__getitem__'
which tells you that the type file
does not allow indexing like f[0]
and so on. If a type has the attribute, __getitem__
, it allows indexing, otherwise it does not. In the case of files, it is the latter.
You can know more about files by doing.
>>> fileTest = open('fileName')
>>> type(fileTest)
<type 'file'>
>>> dir(fileTest)
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
for
loops can generally be applied to any structure which is iterable.
If you want a list of lines, then you can do.
>>> with open('fileName') as f:
lines = f.readlines()
Or by doing,
>>> with open('fileName') as f:
lines = [line for line in f]
Upvotes: 0
Reputation: 46949
You can always use dir(f) to see the structure of f ,f is a file object
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
Upvotes: 0
Reputation: 17234
The reason you are able to do this is because file object
is a iterable.
Upvotes: 1
Reputation: 33857
File variable is something like file handler in C. You open it, operate on it (read, write) and close in the end.
handler.read() # read all file content at once
handler.write(blob) # write there something
handler.readlines() # read list with lines
for line in handler:
print line # iterate lines nicely
The last example is better than for line in handler.readlines()
, because first one read lines when you need them, and the second one consumes all the lines at once (can be trouble with large files)
Upvotes: 0
Reputation: 6752
What you're looking for is readlines
http://docs.python.org/2/library/stdtypes.html#file.readlines
file_lines = f.readlines()
for line in file_lines:
print line
print file_lines[0] # You can access an element by index
Upvotes: 1