Reputation: 113
I have a rather basic problem and not sure where it originates from: lambda capture evaluation in a concurrent environment or misuse of boost filesystem library.
This is sample code:
#include <iostream>
#include <vector>
#include <thread>
#include <boost/filesystem.hpp>
using namespace std;
using namespace boost::filesystem;
void query(const string filename)
{
cout << filename << " ";
}
int main() {
path p("./");
vector<thread> thrs;
for(auto file = directory_iterator(p); file != directory_iterator(); ++file)
{
thread th( [file] {query(file->path().string());} );
thrs.push_back(move(th));
}
for (auto& t : thrs)
t.join();
return 0;
}
which at runtime gives:
:~/workspace/sandbox/Release$ l
main.o makefile objects.mk sandbox* sources.mk subdir.mk
:~/workspace/sandbox/Release$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/a/custom-libs/boost-1_59/lib/ ./sandbox
./subdir.mk ./sources.mk ./sandbox ./objects.mk ./main.o ./main.o
Notice the race condition - not all files end up being passed to thread function (at this run makefile is missing).
I am able to find a workaround by extracting the argument in a local var, rewriting the loop body as:
auto fn = file->path().string();
thread th( [fn] {query(fn);} );
thrs.push_back(move(th));
Where is the race condition coming from?
Isn't file->path().string() evaluated right at the time of thread creation?
Upvotes: 0
Views: 9477
Reputation: 182779
Isn't file->path().string() evaluated right at the time of thread creation?
No. Since query
must be called on the new thread, the statement query(file->path().string())
must be executed on the new thread. So it's executed some time after thread creation when the thread gets around to doing stuff.
You captured file
. Period.
It's conceptually no different from:
string * j = new string ("hello");
thread th( [j] { cout << *j << std::endl; } );
*j = "goodbye";
th.join();
You captured j
. Not *j
. And while j
's value doesn't change, the value of the thing that j
refers to changes. So who knows what it will be when the thread finally dereferences it.
You might think that you're capturing the iterator's value and therefore you'll be okay because that value won't change. Unfortunately, that's just not how this iterator is implemented. It's implemented in a way that allows it to irretrievably discard previous information when it's incremented, so incrementing this type of iterator has the effect of incrementing copies of that iterator too.
If you capture the value of something that refers to something whose value you don't capture, if that second value is changed, the thing you captured now refers to a different value. Always know exactly what you are capturing and how you are capturing it. Sadly, you cannot safely capture by value instances of classes you don't deeply understand.
Upvotes: 3
Reputation: 9406
The code in the thread's lambda function does not get evaluated when the thread is created, but when the thread gets executed. That may be an arbitrary time after creating the thread depending on the OS scheduling.
The real reason why it doesn't work is because a directory_iterator
is a single pass iterator, meaning that it is not possible to copy it, advance one copy and use the old copy again. With capturing, you are exactly doing this. You can modify your program to extract the path before thread creation and capture it:
int main() {
path p("./");
vector<thread> thrs;
for(auto file = directory_iterator(p); file != directory_iterator(); ++file)
{
thrs.emplace_back([path = file->path()] {query(path.string());});
}
for (auto& t : thrs)
t.join();
return 0;
}
Upvotes: 2