Reputation: 119
How could i quickly calculate the size of a large directory while counting all the files in python cross platform, this is my current code but it is very slow on large file numbers(over 100000):
class filecounter:
def count(self, scandir):
global filescount
global totalsize
if not scandir[-1] == '/' or '\\':
scandir = scandir + '/'
try:
for item in os.listdir(scandir):
if os.path.isdir(scandir + item):
filecounter().count(scandir + item)
else:
totalsize = totalsize + os.path.getsize(scandir +item)
filescount = filescount + 1
except WindowsError, IOError:
pass
The global is needed
Upvotes: 2
Views: 1694
Reputation: 44463
If you want to write portable code for file navigation, you should consider using the functions and constants from the os
module (os.path.join
, os.pathsep
, os.altsep
, ...).
One way you can optimise your code is to remove the recursion and the global variable by using the os.walk
function, but it is not going to gain you much. You're going to be limited by the speed of the I/O of your computer.
def count(directory):
totalsize = 0
filecount = 0
for dirpath, dirnames, filenames in os.walk(directory):
for filename in filenames:
try:
totalsize += os.path.getsize(os.path.join(dirpath, filename))
filecount += 1
except OSError:
pass
return totalsize, filecount
Most of the time is going to be spent on syscall to get the list of file in a directory, and to get the size of a particular file. You could probably use python threads to parallelise the call of os.stat
(indirectly called by os.path.getsize
). For once, python thread would work as they release the GIL when doing a syscall.
Upvotes: 2
Reputation: 168606
The documentation for os.walk
has almost precisely the sample you are asking for:
# from http://docs.python.org/2/library/os.html
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
Changing it to meet your needs is fairly simple:
import os
from os.path import join, getsize
size = 0
count = 0
for root, dirs, files in os.walk('.'):
size += sum(getsize(join(root, name)) for name in files)
count += len(files)
print count, size
Upvotes: 3