Sam
Sam

Reputation: 1246

Is there a way to delete multiple files efficiently?

As stated in the title, is there a better way to delete multiple files in python? Currently, I am deleting by looping through each files.

import os
files = ["test_file.txt", "test_failed.txt"]
for file in files:
    if os.path.exists(file):
        os.remove(file)

Upvotes: 3

Views: 4403

Answers (4)

dmc-au
dmc-au

Reputation: 51

For those who see this thread and are looking for syntactic efficiency in deleting all files of a particular type, the following may be useful:

import glob, os
for file in glob.glob("*.txt"): os.remove(file)

Upvotes: 4

Roland Smith
Roland Smith

Reputation: 43495

Let's put this in perspective.

We start by disassembling the for-loop into bytecode using the dis module:

In [23]: dis.dis('for f in files: os.remove(f)')
  1           0 SETUP_LOOP              22 (to 24)
              2 LOAD_NAME                0 (files)
              4 GET_ITER
        >>    6 FOR_ITER                14 (to 22)
              8 STORE_NAME               1 (f)
             10 LOAD_NAME                2 (os)
             12 LOAD_METHOD              3 (remove)
             14 LOAD_NAME                1 (f)
             16 CALL_METHOD              1
             18 POP_TOP
             20 JUMP_ABSOLUTE            6
        >>   22 POP_BLOCK
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

The only real "inefficency" here (and a small one at that) is the repeated name lookup for os.remove. So let's get rid of that by creating a local alias for that first.

In [24]: rm = os.remove
Out[24]: <function posix.remove(path, *, dir_fd=None)>

In [25]: dis.dis('for f in files: rm(f)')
  1           0 SETUP_LOOP              20 (to 22)
              2 LOAD_NAME                0 (files)
              4 GET_ITER
        >>    6 FOR_ITER                12 (to 20)
              8 STORE_NAME               1 (f)
             10 LOAD_NAME                2 (rm)
             12 LOAD_NAME                1 (f)
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 JUMP_ABSOLUTE            6
        >>   20 POP_BLOCK
        >>   22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

This saves one bytecode instruction (LOAD_METHOD) per file. :-/

Generally, list comprehensions can be faster than for-loops. But when I tried both using list of 10 empty but existing files:

In [15]: %timeit -n1 -r1 for f in files: os.remove(f)
71.3 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

compared with a list comprehension using a local alias

In [32]: %timeit -n1 -r1 [rm(f) for f in files]
71 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

there is practically no difference.

Measuring on a recent UNIX system (FreeBSD 12, UFS filesystem on a HDD, using %timeit in IPython);

  • os.path.exists() takes around 2 µs per file in a loop.
  • os.remove() takes around 7-10 µs per file in a loop.

Using os.stat directly instead of via exists does not make much of a difference. And os.remove uses the remove(3) C library call. So most of its time is spent in file system operations, which are inherently really slow compared to a modern CPU.

So apart from writing this in C, using system calls (not C library functions) directly, there is probably not much to be gained.

Upvotes: 5

Noufal Ibrahim
Noufal Ibrahim

Reputation: 72735

There are not many ways to speed this up. The final deletion of the file will have to be done (atleast on Linux which is what I'm guessing your OS is based on the filenames) using the unlink(2) system call that can only delete one file at a time. I imagine the file system does some kind of trickery to give you a kind of parallelism so it might be possible to get some speed boost by using multiple processes. Here are a few other suggestions.

  1. Lose the if os.path.exists. This will run a stat(2) to check if the file exists and it will add a fixed amount of time to every iteration of the loop. Best to just go ahead and delete it and if the file doesn't exist, catch the exception, ignore it and move on (forgiveness rather than permission).
  2. If all these files are in a directory, it might make sense to just call something (maybe even shell out) to delete the entire directory. You should measure this though.
  3. How are going to get the list of files? That will be time consuming as well. If it's going to be read from a file or generated, you should consider that in your optimisation as well.

Upvotes: 0

Ken Kinder
Ken Kinder

Reputation: 13120

There's nothing particularly inefficient about what you're doing.

However if you want to delete an entire directory, you can use rmtree.

import shutil
shutil.rmtree('/my-dir/')

Upvotes: 4

Related Questions