Reputation: 6343
I have a requirement, where I need to delete thousands of files efficiently. At present, files are deleted in a sequential manner.
I want to speed up the deletions, by calling delete in an asynchronous manner, using std::async().
Current Flow:
Desired Flow:
I will launch each of the async tasks using std::launch::async
, so that it runs on a separate thread.
I have following questions:
Is async() suited for workloads involving multiple tasks? Or is it better to use threads for such tasks? I read a chapter (Item 35: Prefer task-based programming to thread-based) in Scott Myer's book "Effective Modern C++", where he recommends using task based programming instead of thread-based.
How costly is each "async()" call? Does it have any overhead like a thread creation overhead? I am planning to control the number of async tasks called per cycle. For e.g. if 10,000 files are to be deleted, I will call just 100 deletes per cycle, instead of spawning 10,000 async() tasks in one go. I hope the standard library implementation efficiently handles multiple async calls (for e.g. using a thread pool).
future() object returned by async() exposes both get() and wait() methods. I read that, get() internally calls wait(). Is it enough to call get() on each of the futures stored in a vector?
What if a get() never returns? Is it advisable to use wait_for() with a time out?
Upvotes: 0
Views: 1333
Reputation:
The bottleneck is the I/O operations and OS level file system operations, delegating thousands of threads to do this is not likely to alleviate that bottleneck -- in fact, you're likely to find that this method will actually slow things down.
As others have mentioned, depending on the size of the files, it might be better to store the data in an internal database rather than abusing the file system.
Otherwise, I'd probably recommend using one thread for file deletion, then you can just wait (or not wait) for the thread to complete.
To answer one of your questions about how costly async
is: the implementation of std::async
is compiler and OS specific and would be comparable to the overhead of the native threading implementation is on your machine. Really, the best thing to do is to benchmark it yourself.
Upvotes: 1
Reputation: 29017
As a completely different approach, have you considered moving everything into a database? Deleting thousands of persistent things quickly is just the sort of stuff databases are good at.
Upvotes: 1
Reputation: 29017
You may find this doesn't actually help as much as you would like. The file system is likely to have kernel level locks (to ensure consistency), and having many threads hitting these locks it likely to cause trouble.
I suggest
Wait for the ten threads to finish.
Experiment with different values of ten.
Upvotes: 3