Mutating Algorithm
Mutating Algorithm

Reputation: 2758

Parallelizing recursive function using OpenMP in C++

I have the following recursive program which I would like to parallelize using OpenMP:

#include <iostream>
#include <cmath>
#include <numeric>
#include <vector>
#include <algorithm>
#include <thread>
#include <omp.h>


// Determines if a point of dimension point.size() is within the sphere
bool isPointWithinSphere(std::vector<int> point, const double &radius) {

    // Since we know that the sphere is centered at the origin, we can simply
    // find the euclidean distance (square root of the sum of squares) and check to
    // see if it is less than or equal to the length of the radius 

    //square each element inside the point vector
    std::transform(point.begin(), point.end(), point.begin(), [](auto &x){return std::pow(x,2);});

    //find the square root of the sum of squares and check if it is less than or equal to the radius
    return std::sqrt(std::accumulate(point.begin(), point.end(), 0, std::plus<int>())) <= radius;    
}

// Counts the number of lattice points inside the sphere( all points (x1 .... xn) such that xi is an integer )

// The algorithm: If the radius is a floating point value, first find the floor of the radius and cast it to 
// an integer. For example, if the radius is 2.43 then the only integer points we must check are those between
// -2 and 2. We generate these points by simulating n - nested loops using recursion and passing each point
// in to the boolean function isPointWithinSphere(...), if the function returns true, we add one to the count
// (we have found a lattice point on the sphere). 

int countLatticePoints(std::vector<int> point, const double radius, const int dimension, int count = 0) {

    const int R = static_cast<int>(std::floor(radius));

    #pragma omp parallel for
    for(int i = -R; i <= R; i++) {
        point.push_back(i);

        if(point.size() == dimension){
            if(isPointWithinSphere(point, radius)) count++;
        }else count = countLatticePoints(point, radius, dimension, count);

        point.pop_back();

    }

    return count;
}

int main(int argc, char ** argv) {
    std::vector<int> vec;

    #pragma omp parallel
    std::cout << countLatticePoints(vec, 5, 7) << std::endl;   

    return 0;
}

I have tried adding a parallel region within the main function as well as parallelizing the for loop within countLatticePoints yet I see hardly any improvement gained from parallelizing vs running the algorithm sequentially. Any help / advice would be appreciated in terms of other OpenMP strategies that I can use.

Upvotes: 0

Views: 1428

Answers (1)

1201ProgramAlarm
1201ProgramAlarm

Reputation: 32717

I'll take the advice route. Before trying to make your program faster using threads, you first want to make it faster in the single threaded case. There are several improvements you can make. You're making a lot of copies of your point vectors, which incurs a lot of expensive memory allocations.

Pass point into isPointWithinSphere as a reference. Then, rather than two loops, use one loop to square and accumulate the each element in point. Then, when checking the radius, compare the square of the distance rather than the distance. This avoids the sqrt call and replaces it with a simple square.

countLatticePoints should also take point by reference. Rather than calling point.size(), subtract 1 from dimension each time you recurse, then just check for dimension == 1 instead of computing the size.

With all that, if you still want/need to introduce threading, you'll need to make some adjustments due to passing point by reference. countLatticePoint will need to have two variants, the initial call that has the OpenMP directive in it, and the recursive one that does not have them.

The #pragma omp parallel in main won't do anything because there is only one block of code to execute.

Upvotes: 1

Related Questions