Reputation: 663
def getSize(path):
start_time = time.time()
totalSize = 0
if os.path.isdir(path):
for dirpath, dirnames, filenames in os.walk(path):
for fName in filenames:
fp = os.path.join(dirpath, fName)
totalSize += os.path.getsize(fp)
print time.time() - start_time, "seconds"
return totalSize
else:
return os.path.getsize(path)
above function takes around 25 sec to find the size of directory which right now contains lot of files. Couldany one tell me some efficient function to do the same so time to find the size is less?
Upvotes: 3
Views: 1356
Reputation: 154846
The problem is not in the size of data, but in the number of (presumably small) files that contain it. I don't see a way to significantly optimize your approach — system utilities like du
calculate size using the same approach. Nevertheless, here are several suggestions, ordered by increasing difficulty and effectiveness:
For a small speedup, you could roll your own variant of os.walk
that obtains file size from the same os.stat
call used to distinguish between files and directories. This might buy you a second because of the reduced number of syscalls.
You could code getSize
in Python/C or Cython to avoid interpreter overhead while inspecting a huge number of files and directories. This might by you a few more seconds, at best.
Change the code that writes the data to also maintain a total size, or a file size index in a single database (think sqlite) that can itself be indexed. This will make the size lookup instantaneous.
Monitor the directories being written to using inotify
or equivalent, and save the result to a database as before. This will be a net win work as long as the writes are infrequent compared to reads. It is harder to implement, but it has the benefit of requiring no changes to the code that does the writing.
Upvotes: 2