Reputation: 1246
As stated in the title, is there a better way to delete multiple files in python? Currently, I am deleting by looping through each files.
import os
files = ["test_file.txt", "test_failed.txt"]
for file in files:
if os.path.exists(file):
os.remove(file)
Upvotes: 3
Views: 4403
Reputation: 51
For those who see this thread and are looking for syntactic efficiency in deleting all files of a particular type, the following may be useful:
import glob, os
for file in glob.glob("*.txt"): os.remove(file)
Upvotes: 4
Reputation: 43495
Let's put this in perspective.
We start by disassembling the for
-loop into bytecode using the dis
module:
In [23]: dis.dis('for f in files: os.remove(f)')
1 0 SETUP_LOOP 22 (to 24)
2 LOAD_NAME 0 (files)
4 GET_ITER
>> 6 FOR_ITER 14 (to 22)
8 STORE_NAME 1 (f)
10 LOAD_NAME 2 (os)
12 LOAD_METHOD 3 (remove)
14 LOAD_NAME 1 (f)
16 CALL_METHOD 1
18 POP_TOP
20 JUMP_ABSOLUTE 6
>> 22 POP_BLOCK
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE
The only real "inefficency" here (and a small one at that) is the repeated name lookup for os.remove
. So let's get rid of that by creating a local alias for that first.
In [24]: rm = os.remove
Out[24]: <function posix.remove(path, *, dir_fd=None)>
In [25]: dis.dis('for f in files: rm(f)')
1 0 SETUP_LOOP 20 (to 22)
2 LOAD_NAME 0 (files)
4 GET_ITER
>> 6 FOR_ITER 12 (to 20)
8 STORE_NAME 1 (f)
10 LOAD_NAME 2 (rm)
12 LOAD_NAME 1 (f)
14 CALL_FUNCTION 1
16 POP_TOP
18 JUMP_ABSOLUTE 6
>> 20 POP_BLOCK
>> 22 LOAD_CONST 0 (None)
24 RETURN_VALUE
This saves one bytecode instruction (LOAD_METHOD
) per file. :-/
Generally, list comprehensions can be faster than for
-loops. But when I tried both using list of 10 empty but existing files:
In [15]: %timeit -n1 -r1 for f in files: os.remove(f)
71.3 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
compared with a list comprehension using a local alias
In [32]: %timeit -n1 -r1 [rm(f) for f in files]
71 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
there is practically no difference.
Measuring on a recent UNIX system (FreeBSD 12, UFS filesystem on a HDD, using %timeit
in IPython);
os.path.exists()
takes around 2 µs per file in a loop.os.remove()
takes around 7-10 µs per file in a loop.Using os.stat
directly instead of via exists
does not make much of a difference.
And os.remove
uses the remove(3)
C library call. So most of its time is spent in file system operations, which are inherently really slow compared to a modern CPU.
So apart from writing this in C, using system calls (not C library functions) directly, there is probably not much to be gained.
Upvotes: 5
Reputation: 72735
There are not many ways to speed this up. The final deletion of the file will have to be done (atleast on Linux which is what I'm guessing your OS is based on the filenames) using the unlink(2)
system call that can only delete one file at a time. I imagine the file system does some kind of trickery to give you a kind of parallelism so it might be possible to get some speed boost by using multiple processes. Here are a few other suggestions.
if os.path.exists
. This will run a stat(2)
to check if the file exists and it will add a fixed amount of time to every iteration of the loop. Best to just go ahead and delete it and if the file doesn't exist, catch the exception, ignore it and move on (forgiveness rather than permission). Upvotes: 0
Reputation: 13120
There's nothing particularly inefficient about what you're doing.
However if you want to delete an entire directory, you can use rmtree.
import shutil
shutil.rmtree('/my-dir/')
Upvotes: 4