Reputation: 101
I currently have a Python program calls shutil.rmtree
when it finishes to delete a large number of files that is creates as it executes. This call is taking in the order of ~20+ seconds. I have profiled this using cProfile and almost all of this time is spent on posix.remove calls.
If I don't delete these files as part of the Python program but instead call rm -rf
on the folder after the program is finished executing, the rm -rf
executes in <5 seconds.
Is there something in particular that may be causing this huge difference in execution time?
Upvotes: 3
Views: 1346
Reputation: 106553
shutil.rmtree
makes a system call of os.stat
on every file entry it traverses to determine if it's a file or a directory, which is a massive waste of time since that information is already obtained when a directory is listed.
This information is something that the os.walk
function takes advantage of (see PEP-471 for details), with which you can implement rmtree
yourself:
import os
def rmtree(directory):
for root, dirs, files in os.walk(directory, topdown=False):
for file in files:
os.remove(os.path.join(root, file))
for dir in dirs:
os.rmdir(os.path.join(root, dir))
os.rmdir(directory)
Upvotes: 2
Reputation:
Looking at the source for rmtree
, it has a lot of python code that executes in addition to the minimal amount of native code. A lot of it is also string processing, which makes several small ephemeral objects. I don't have a profile handy right now, but my guess is that much of the time is spent in the loop body of _rmtree_safe_fd
.
Upvotes: 0