Reputation: 587
I'm not exactly new to Python, but I do still have trouble understanding what makes something "Pythonic" (and the converse).
So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?
file.__len__
is not even implemented, so it's not like it's needed for something else? Would it be confusing/inconsistent for some reason if it was implemented to return the file size?
Upvotes: 25
Views: 57086
Reputation: 304483
file is an iterator. To find the number of lines you need to read the entire file
sum(1 for line in file)
if you want the number of bytes in a file, use os.stat
eg
import os
os.stat(filename).st_size
Upvotes: 25
Reputation: 10602
Files have a broader definition, especially in Unix, than you may be thinking. What is the length of a printer, for example? Or a CDROM drive? Both are files in /dev, and sort of in Windows.
For what we normally think of as a file, what would its length be? The size of the variable? The size of the file in bytes? The latter makes more sense, but then it gets ickier. Should the size of the file's contents be listed, or its size on disk (modulus allocation unit size). The question arises again for sparse files (files that have large empty sections which take no space, but are part of the file's normally reported size, supported by some file systems like NTFS and XFS).
Of course, the answer to all of those could be, "just pick one and document what you picked." Perhaps that is exactly what should be done, but to be Pythonic, something usually must be clear-cut without having to read a lot of docs. len(string)
is mostly obvious (one may ask if bytes or characters are the return value), len(array)
is obvious, len(file)
maybe not quite enough.
Upvotes: 26
Reputation: 42120
So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?
Charles Burns' answer makes a good point about Unix's "everything is a file" philosophy, and, although you always can use os.fstat()
to get the 'size' for any file descriptor, with something like...
import os
f = open(anything)
size = os.fstat(f.fileno()).st_size
...it may not return anything meaningful or useful...
>>> os.fstat(sys.stdout.fileno()).st_size
0
>>> fd1, fd2 = os.pipe()
>>> os.fstat(fd1).st_size
0
I think the reason is that a Python file object, or file-like object, is supposed to represent a stream, and streams don't inherently have a length, especially if they're write-only, like sys.stdout
.
Usually, the only thing you can guarantee about a Python file-like object is that it will support at least one of read()
or write()
, and that's about it.
Upvotes: 5
Reputation: 711
A simple way to measure the number of characters would be:
file = open('file.bin', 'r')
# Seek to the end. (0 bytes relative to the end)
file.seek(0, 2)
length = file.tell()
Upvotes: 3
Reputation: 251186
file
returns an iterator, so you can't use len()
on it.
To get the size of a file you can use os.stat
:
>>> foo = os.stat("abc")
>>> foo.st_size
193L
If by size you mean number of line then try these:
len(open("abc").readlines())
or
sum (1 for _ in open("abc"))
Upvotes: 9
Reputation: 592
I would say because finding the length depends on OS specific functionality. You can find the length of a file with this code:
import os
os.path.getsize('C:\\file.txt')
You could also read the entire file into a string and find the length of the string. However you would want to be sure that the file is not of a huge size that will eat up all your memory.
Upvotes: 2