Reputation: 309
I'm trying to write a program that computes the scalar product of two vectors in parallel. For this, I'm using a binary tree structure of threads: initially, each thread computes the scalar product of a specific range in the two vectors, then each thread waits for its two children to finish before terminating itself (for thread with id n
, the two children are threads with id 2 * n + 1
and 2 * n + 2
). In the end, I'll just wait for thread with id 0
(the root of the binary tree) to terminate and I should have the final result.
However, this approach fails to work as I'm getting a strange terminate called without an active exception
error right after the creation of the first thread (thread with id numThreads - 1
).
I've read that this happens whenever a thread goes out of scope and it was not joined. I consider that this is not the case here, no thread goes out of scope and, in the end, I'm waiting for thread 0
which, indirectly, joins all the other existing threads.
Here is the full source code:
#include <iostream>
#include <utility>
#include <vector>
#include <thread>
int leftNode(int index) { return 2 * index + 1; }
int rightNode(int index) { return 2 * index + 2; }
bool hasLeftNode(int numThreads, int index) {
int node = leftNode(index);
return node >= 0 && node < numThreads;
}
bool hasRightNode(int numThreads, int index) {
int node = rightNode(index);
return node >= 0 && node < numThreads;
}
std::vector<std::pair<int, int>> splitWorkload(int n, int numThreads) {
std::vector<std::pair<int, int>> intervals;
int index = 0;
int step = n / numThreads;
int mod = n % numThreads;
while (index < n) {
intervals.push_back(std::pair<int, int>(index, index + step + (mod > 0)));
index += step + (mod > 0);
mod--;
}
return intervals;
}
int scalarProduct(std::vector<int> &a, std::vector<int> &b, int n, int numThreads) {
std::vector<int> sums(n, 0);
std::vector<std::thread> threads;
threads.reserve(numThreads);
std::vector<std::pair<int, int>> intervals = splitWorkload(n, numThreads);
for (int i = numThreads - 1; i >= 0; i--) {
threads[i] = std::thread([&, i]() {
for (int k = intervals[i].first; k < intervals[i].second; k++) {
sums[i] += a[k] * b[k];
}
if (hasLeftNode(numThreads, i)) {
threads[leftNode(i)].join();
sums[i] += sums[leftNode(i)];
}
if (hasRightNode(numThreads, i)) {
threads[rightNode(i)].join();
sums[i] += sums[rightNode(i)];
}
});
}
threads[0].join();
return sums[0];
}
int main() {
int n = 10000;
std::vector<int> a;
for (int i = 0; i < n; i++) {
a.push_back(i);
}
std::vector<int> b;
for (int i = 0; i < n; i++) {
b.push_back(i + 1);
}
int numThreads = 4;
std::cout << scalarProduct(a, b, n, numThreads) << "\n";
return 0;
}
Upvotes: 0
Views: 88
Reputation: 8437
In addition to the point that has mentioned correctly by @vll, for large values of n
, the sum will be really big and cannot be kept inside an int
variable. So, it is better to store them inside a long long
variable:
long long scalarProduct(std::vector<int> &a, std::vector<int> &b, int n, int numThreads)
{
std::vector<long long> sums(n, 0);
// rest of the code
Upvotes: 1