Sharadhi Ballal
Sharadhi Ballal

Reputation: 663

Efficient python function to find the size of the directory

def getSize(path):
    start_time = time.time()
    totalSize = 0
    if os.path.isdir(path):

        for dirpath, dirnames, filenames in os.walk(path):
            for fName in filenames:
                fp = os.path.join(dirpath, fName)
                totalSize += os.path.getsize(fp)
        print time.time() - start_time, "seconds"
        return totalSize

    else:
        return os.path.getsize(path)

above function takes around 25 sec to find the size of directory which right now contains lot of files. Couldany one tell me some efficient function to do the same so time to find the size is less?

Upvotes: 3

Views: 1356

Answers (1)

user4815162342
user4815162342

Reputation: 154846

The problem is not in the size of data, but in the number of (presumably small) files that contain it. I don't see a way to significantly optimize your approach — system utilities like du calculate size using the same approach. Nevertheless, here are several suggestions, ordered by increasing difficulty and effectiveness:

  • For a small speedup, you could roll your own variant of os.walk that obtains file size from the same os.stat call used to distinguish between files and directories. This might buy you a second because of the reduced number of syscalls.

  • You could code getSize in Python/C or Cython to avoid interpreter overhead while inspecting a huge number of files and directories. This might by you a few more seconds, at best.

  • Change the code that writes the data to also maintain a total size, or a file size index in a single database (think sqlite) that can itself be indexed. This will make the size lookup instantaneous.

  • Monitor the directories being written to using inotify or equivalent, and save the result to a database as before. This will be a net win work as long as the writes are infrequent compared to reads. It is harder to implement, but it has the benefit of requiring no changes to the code that does the writing.

Upvotes: 2

Related Questions