Dr. Kickass
Dr. Kickass

Reputation: 587

Why no len(file) in Python?

I'm not exactly new to Python, but I do still have trouble understanding what makes something "Pythonic" (and the converse).

So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?

file.__len__ is not even implemented, so it's not like it's needed for something else? Would it be confusing/inconsistent for some reason if it was implemented to return the file size?

Upvotes: 25

Views: 57086

Answers (6)

John La Rooy
John La Rooy

Reputation: 304483

file is an iterator. To find the number of lines you need to read the entire file

sum(1 for line in file)

if you want the number of bytes in a file, use os.stat

eg

import os
os.stat(filename).st_size

Upvotes: 25

Charles Burns
Charles Burns

Reputation: 10602

Files have a broader definition, especially in Unix, than you may be thinking. What is the length of a printer, for example? Or a CDROM drive? Both are files in /dev, and sort of in Windows.

For what we normally think of as a file, what would its length be? The size of the variable? The size of the file in bytes? The latter makes more sense, but then it gets ickier. Should the size of the file's contents be listed, or its size on disk (modulus allocation unit size). The question arises again for sparse files (files that have large empty sections which take no space, but are part of the file's normally reported size, supported by some file systems like NTFS and XFS).

Of course, the answer to all of those could be, "just pick one and document what you picked." Perhaps that is exactly what should be done, but to be Pythonic, something usually must be clear-cut without having to read a lot of docs. len(string) is mostly obvious (one may ask if bytes or characters are the return value), len(array) is obvious, len(file) maybe not quite enough.

Upvotes: 26

Aya
Aya

Reputation: 42120

So forgive me if this is a stupid question, but why can't I get the size of a file by doing a len(file)?

Charles Burns' answer makes a good point about Unix's "everything is a file" philosophy, and, although you always can use os.fstat() to get the 'size' for any file descriptor, with something like...

import os

f = open(anything)
size = os.fstat(f.fileno()).st_size

...it may not return anything meaningful or useful...

>>> os.fstat(sys.stdout.fileno()).st_size
0
>>> fd1, fd2 = os.pipe()
>>> os.fstat(fd1).st_size
0

I think the reason is that a Python file object, or file-like object, is supposed to represent a stream, and streams don't inherently have a length, especially if they're write-only, like sys.stdout.

Usually, the only thing you can guarantee about a Python file-like object is that it will support at least one of read() or write(), and that's about it.

Upvotes: 5

gepoch
gepoch

Reputation: 711

A simple way to measure the number of characters would be:

file = open('file.bin', 'r')
# Seek to the end. (0 bytes relative to the end)
file.seek(0, 2)
length = file.tell()

Upvotes: 3

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251186

file returns an iterator, so you can't use len() on it.

To get the size of a file you can use os.stat:

>>> foo = os.stat("abc")
>>> foo.st_size
193L

If by size you mean number of line then try these:

len(open("abc").readlines())

or

sum (1 for _ in open("abc"))

Upvotes: 9

wardd
wardd

Reputation: 592

I would say because finding the length depends on OS specific functionality. You can find the length of a file with this code:

import os os.path.getsize('C:\\file.txt')

You could also read the entire file into a string and find the length of the string. However you would want to be sure that the file is not of a huge size that will eat up all your memory.

Upvotes: 2

Related Questions