Reputation: 12653
I have a long running python script which creates and deletes temporary files. I notice there is a non-trivial amount of time spent on file deletion, but the only purpose of deleting those files is to ensure that the program doesn't eventually fill up all the disk space during a long run. Is there a cross platform mechanism in Python to aschyronously delete a file so the main thread can continue to work while the OS takes care of the file delete?
Upvotes: 9
Views: 5971
Reputation: 136208
You can try delegating deleting the files to another thread or process.
Using a newly spawned thread:
thread.start_new_thread(os.remove, filename)
Or, using a process:
# create the process pool once
process_pool = multiprocessing.Pool(1)
results = []
# later on removing a file in async fashion
# note: need to hold on to the async result till it has completed
results.append(process_pool.apply_async(os.remove, filename), callback=lambda result: results.remove(result))
The process version may allow for more parallelism because Python threads are not executing in parallel due to the notorious global interpreter lock. I would expect though that GIL is released when it calls any blocking kernel function, such as unlink()
, so that Python lets another thread to make progress. In other words, a background worker thread that calls os.unlink()
may be the best solution, see Tim Peters' answer.
Yet, multiprocessing
is using Python threads underneath to asynchronously communicate with the processes in the pool, so some benchmarking is required to figure which version gives more parallelism.
An alternative method to avoid using Python threads but requires more coding is to spawn another process and send the filenames to its standard input through a pipe. This way you trade os.remove()
to a synchronous os.write()
(one write()
syscall). It can be done using deprecated os.popen()
and this usage of the function is perfectly safe because it only communicates in one direction to the child process. A working prototype:
#!/usr/bin/python
from __future__ import print_function
import os, sys
def remover():
for line in sys.stdin:
filename = line.strip()
try:
os.remove(filename)
except Exception: # ignore errors
pass
def main():
if len(sys.argv) == 2 and sys.argv[1] == '--remover-process':
return remover()
remover_process = os.popen(sys.argv[0] + ' --remover-process', 'w')
def remove_file(filename):
print(filename, file=remover_process)
remover_process.flush()
for file in sys.argv[1:]:
remove_file(file)
if __name__ == "__main__":
main()
Upvotes: 14
Reputation: 70582
You can create a thread to delete files, following a common producer-consumer pattern:
import threading, Queue
dead_files = Queue.Queue()
END_OF_DATA = object() # a unique sentinel value
def background_deleter():
import os
while True:
path = dead_files.get()
if path is END_OF_DATA:
return
try:
os.remove(path)
except: # add the exceptions you want to ignore here
pass # or log the error, or whatever
deleter = threading.Thread(target=background_deleter)
deleter.start()
# when you want to delete a file, do:
# dead_files.put(file_path)
# when you want to shut down cleanly,
dead_files.put(END_OF_DATA)
deleter.join()
CPython releases the GIL (global interpreter lock) around internal file deletion calls, so this should be effective.
I would advise against spawning a new process per delete. On some platforms, process creation is quite expensive. Would also advise against spawning a new thread per delete: in a long-running program, you really never want the possibility of creating an unbounded number of threads at any point. Depending on how quickly file deletion requests pile up, that could happen here.
The "solution" above is wordier than the others, because it avoids all that. There's only one new thread total. Of course it could easily be generalized to use any fixed number of threads instead, all sharing the same dead_files
queue. Start with 1, add more if needed ;-)
Upvotes: 8
Reputation: 140445
The OS-level file removal primitives are synchronous on both Unix and Windows, so I think you pretty much have to use a worker thread. You could have it pull files to delete off a Queue object, and then when the main thread is done with a file it can just post the file to the queue. If you're using NamedTemporaryFile objects, you probably want to set delete=False
in the constructor and just post the name to the queue, not the file object, so you don't have object lifetime headaches.
Upvotes: 4