Dominic
Dominic

Reputation: 101

Python shutil.rmtree call taking an extremely long time

I currently have a Python program calls shutil.rmtree when it finishes to delete a large number of files that is creates as it executes. This call is taking in the order of ~20+ seconds. I have profiled this using cProfile and almost all of this time is spent on posix.remove calls.

If I don't delete these files as part of the Python program but instead call rm -rf on the folder after the program is finished executing, the rm -rf executes in <5 seconds.

Is there something in particular that may be causing this huge difference in execution time?

Upvotes: 3

Views: 1346

Answers (2)

blhsing
blhsing

Reputation: 106553

shutil.rmtree makes a system call of os.stat on every file entry it traverses to determine if it's a file or a directory, which is a massive waste of time since that information is already obtained when a directory is listed.

This information is something that the os.walk function takes advantage of (see PEP-471 for details), with which you can implement rmtree yourself:

import os
def rmtree(directory):
    for root, dirs, files in os.walk(directory, topdown=False):
        for file in files:
            os.remove(os.path.join(root, file))
        for dir in dirs:
            os.rmdir(os.path.join(root, dir))
    os.rmdir(directory)

Upvotes: 2

user554538
user554538

Reputation:

Looking at the source for rmtree, it has a lot of python code that executes in addition to the minimal amount of native code. A lot of it is also string processing, which makes several small ephemeral objects. I don't have a profile handy right now, but my guess is that much of the time is spent in the loop body of _rmtree_safe_fd.

Upvotes: 0

Related Questions