Reputation: 179
After I try to parallelize the code with openmp, the elements in the array are wrong, as for the order of the elements is not very important. Or is it more convenient to use c++ std vector instead of array to parallelize, could you suggest a easy way?
#include <stdio.h>
#include <math.h>
int main()
{
int n = 100;
int a[n*(n+1)/2]={0};
int count=0;
#pragma omp parallel for reduction(+:a,count)
for (int i = 1; i <= n; i++) {
for (int j = i + 1; j <= n; j++) {
double k = sqrt(i * i + j * j);
if (fabs(round(k) - k) < 1e-10) {
a[count++] = i;
a[count++] = j;
a[count++] = (int) k;
}
}
}
for(int i=0;i<count;i++)
printf("%d %s",a[i],(i+1)%3?"":", ");
printf("\ncount: %d", count);
return 0;
}
Original output:
3 4 5 , 5 12 13 , 6 8 10 , 7 24 25 , 8 15 17 , 9 12 15 , 9 40 41 , 10 24 26 , 11 60 61 , 12 16 20 , 12 35 37 , 13 84 85 , 14 48 50 , 15 20 25 , 15 36 39 , 16 30 34 , 16 63 65 , 18 24 30 , 18 80 82 , 20 21 29 , 20 48 52 , 20 99 101 , 21 28 35 , 21 72 75 , 24 32 40 , 24 45 51 , 24 70 74 , 25 60 65 , 27 36 45 , 28 45 53 , 28 96 100 , 30 40 50 , 30 72 78 , 32 60 68 , 33 44 55 , 33 56 65 , 35 84 91 , 36 48 60 , 36 77 85 , 39 52 65 , 39 80 89 , 40 42 58 , 40 75 85 , 40 96 104 , 42 56 70 , 45 60 75 , 48 55 73 , 48 64 80 , 48 90 102 , 51 68 85 , 54 72 90 , 56 90 106 , 57 76 95 , 60 63 87 , 60 80 100 , 60 91 109 , 63 84 105 , 65 72 97 , 66 88 110 , 69 92 115 , 72 96 120 , 75 100 125 , 80 84 116 ,
count: 189
After using OpenMP(gcc file.c -fopenmp):
411 538 679 , 344 609 711 , 354 533 649 , 218 387 449 , 225 475 534 , 182 283 339 , 81 161 182 , 74 190 204 , 77 138 159 , 79 176 195 , 18 24 30 , 18 80 82 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 ,
count: 189
Upvotes: 1
Views: 899
Reputation: 3031
As an alternative to using a critical
section, this solution uses atomics and could therefore be faster.
The following code might freeze your computer due to memory consumption. Be careful!
#include <cstdio>
#include <cmath>
#include <vector>
int main() {
int const n = 100;
// without a better (smaller) upper_bound this is extremely
// wasteful in terms of memory for big n
long const upper_bound = 3L * static_cast<long>(n) *
(static_cast<long>(n) - 1L) / 2l;
std::vector<int> a(upper_bound, 0);
int count = 0;
#pragma omp parallel for schedule(dynamic) shared(a, count)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
double const k = std::sqrt(static_cast<double>(i * i + j * j));
if (std::fabs(std::round(k) - k) < 1e-10) {
int my_pos;
#pragma omp atomic capture
my_pos = count++;
a[3 * my_pos] = i;
a[3 * my_pos + 1] = j;
a[3 * my_pos + 2] = static_cast<int>(std::round(k));
}
}
}
count *= 3;
for(int i = 0; i < count; ++i) {
std::printf("%d %s", a[i], (i + 1) % 3 ? "" : ", ");
}
printf("\ncount: %d", count);
return 0;
}
EDIT:
My answer was initially a reaction to a by now deleted answer using a critical
section in a sub-optimal way. In the following I will present another solution which combines a critical
section with using std::vector::emplace_back()
to circumvent the need for upper_bound
similar to Toby Speight's solution. Generally using a reduce
clause like in Toby Speight's solution should be preferred over critical
sections and atomic
s, as reductions should scale better for big numbers of threads. In this particular case (relatively few calculations will be written to a
) and without a big amount of cores to run on, the following code might still be preferable.
#include <cstdio>
#include <cmath>
#include <tuple>
#include <vector>
int main() {
int const n = 100;
std::vector<std::tuple<int, int, int>> a{};
// optional, might reduce number of reallocations
a.reserve(2 * n); // 2 * n is an arbitrary choice
#pragma omp parallel for schedule(dynamic) shared(a)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
double const k = std::sqrt(static_cast<double>(i * i + j * j));
if (std::fabs(std::round(k) - k) < 1e-10) {
#pragma omp critical
a.emplace_back(i, j, static_cast<int>(std::round(k)));
}
}
}
long const count = 3L * static_cast<long>(a.size());
for(unsigned long i = 0UL; i < a.size(); ++i) {
std::printf("%d %d %d\n",
std::get<0>(a[i]), std::get<1>(a[i]), std::get<2>(a[i]));
}
printf("\ncount: %ld", count);
return 0;
}
Upvotes: 2
Reputation: 30834
Your threads are all accessing the shared count
.
You would be better off eliminating count
and have each loop iteration determine where to write its output based only on the (per-thread) values of i
and j
.
Alternatively, use a vector to accumulate the results:
#include <cmath>
#include <iostream>
#include <utility>
#include <vector>
#pragma omp declare \
reduction(vec_append : std::vector<std::pair<int,int>> : \
omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
int main()
{
constexpr int n = 100'000;
std::vector<std::pair<int,int>> result;
#pragma omp parallel for \
reduction(vec_append:result) \
schedule(dynamic)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
auto const h2 = i * i + j * j; // hypotenuse squared
int h = std::sqrt(h2) + 0.5; // integer square root
if (h * h == h2) {
result.emplace_back(i, j);
}
}
}
// for (auto const& v: result) {
// std::cout << v.first << ' '
// << v.second << ' '
// << std::hypot(v.first, v.second) << ' ';
// }
std::cout << "\ncount: " << result.size() << '\n';
}
Upvotes: 3
Reputation: 573
The count
variable is an index into a
. The reduction(+:a,count)
operator is summing the array, it is not a concatenation operation which is what I think you are looking for.
The count variable needs to be surrounded by a mutex, something like #pragma omp critical
, but I am not an OpenMP expert.
Alternatively, create int a[n][n]
, set all of them to -1 (a sentinel value to indicate "invalid") then assign the result of the sqrt()
when it is near enough to a whole number.
Upvotes: 2