colos_enough
colos_enough

Reputation: 181

Improve performance of parallel calculation of euler number

I am trying to calculate e=∑(3−4k^2/(2k+1)!); k=0..10000 However I got stuck and can't get a desired performance boost using multithreading.

Given a number of threads, I tried to divide the whole sum to k / numberOfThreads chunks and submit futures for each partial sum. I think the bad part might be the factorial calculation or the granularity. I tried with a smaller step, but didn't get a big improvement. Maybe a different approach is needed.

ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
int step = k / numberOfThreads ;
BigDecimal result = BigDecimal.ZERO;
for (int j = 0; j <= k; j += step) {
    Future<BigDecimal> future = executor.submit(new EulerCalculator(j, j + step));
    futures.add(future);
}
for (Future<BigDecimal> future : futures) {
    result = result.add(future.get());
}
public class EulerCalculator implements Callable<BigDecimal> {
    private int start;
    private int end;

    public BigDecimal call() {
        long numerator = 3 - 4 * start * start;
        BigDecimal denominator = factorial(2 * start + 1);
        BigDecimal partialSum = BigDecimal.valueOf(numerator)
                                .divide(denominator, 1000, RoundingMode.HALF_EVEN);
        for (int i = start + 1 ; i < end; i++) {
            numerator = 3 - 4 * i * i;
            denominator = denominator.multiply(BigDecimal.valueOf(2 * i * (2*i + 1)));
            partialSum = partialSum.add(BigDecimal.valueOf(numerator)
                                        .divide(fact, 1000, RoundingMode.HALF_EVEN));
        }

        return partialSum;
    }

    private BigDecimal factorial(int cur) {
        BigDecimal fact = BigDecimal.ONE;
        for (int i = 2; i <= cur; i++) {
            fact = fact.multiply(BigDecimal.valueOf(i));
        }

        return fact;
    }
}

Best results from a few runs on a quad-core:

k = 10000

threads = 1: 345ms

threads = 2: 216ms

threads = 4: 184ms

threads = 8: 225ms

Upvotes: 2

Views: 917

Answers (3)

Marco R.
Marco R.

Reputation: 2720

Since you need all the denominators and each one depends on ALL previous, I would have a single dedicated thread to compute all of them; and for each denominator computed submit a different task to your thread pool to compute the particular partial sum in parallel. Finally aggregate all results using a parallel stream. The following code show these details:

    public static BigDecimal calculate(int k, int numberOfThreads) {
        ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
        List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);

        BigDecimal denominator = BigDecimal.ONE;
        for (int j = 1; j <= k; j++) {
            denominator = denominator.multiply(BigDecimal.valueOf(4 * j * j + 2 * j));
            Future<BigDecimal> future = executor.submit(computePartialSum(j, denominator));
            futures.add(future);
        }

        return futures.stream().parallel()
            .map(future.get())
            .reduce(BigDecimal.ZERO, BigDecimal::add).add(BigDecimal.valueOf(3));
    }

    public static Callable<BigDecimal> computePartialSum(int curr, BigDecimal denominator) {
        return () -> {
            long numerator = 3 - 4 * curr * curr;
            return BigDecimal.valueOf(numerator).divide(denominator, 1000, RoundingMode.HALF_EVEN);
        };
    }

Still, your bottleneck will be the computation of the factorials; which you can partition into smaller factorial segments and cache them to aggregate into their true values, my two cents.

Complete code on GitHub

Upvotes: 1

colos_enough
colos_enough

Reputation: 181

Thanks for the answers! I cached the factorials with a simple for loop and I got good results for the other calculation:

1 thread = 17ms
2 threads  = 10ms
4 threads = 7ms

However I need to draw a diagram similar to the one below and that will be only possible if I exploit the threads for calculating factorial.

enter image description here

I tested this n! algorithm:

public BigDecimal calculate(int number) {
        if (number == 0 || number == 1) {
            return BigDecimal.ONE;
        }
        List<Callable<BigDecimal>> callables = new ArrayList<>();
        int step = number / processors;
        for (int i = 2; i <= number; i += step + 1) {
            callables.add(new FactorialPartCalculator(i, i + step >= number ? number : i + step));
        }
        List<Future<BigDecimal>> futures = executor.invokeAll(callables);
        BigDecimal result = BigDecimal.ONE;
        for (Future<BigDecimal> future : futures) {
            result = result.multiply(future.get());
        }
        return result;
    }
public class FactorialPartCalculator implements Callable<BigDecimal> {
    @Override
    public BigDecimal call() throws Exception {
        BigDecimal factorialPart = BigDecimal.ONE;
        for (int i = start; i <= end; i++) {
            factorialPart = factorialPart.multiply(BigDecimal.valueOf(i));
        }

        return factorialPart;
    }

I got 6.4x speedup with 6 threads for 20000!. So I need to cache the factorials and include the caching process in the overall time. The program will be tested on 32 processors and I should gain as much speedup as possible

So my question is how can I change the above algorithm to store all the factorials in an array? I only need the odd factorials if that can help.

Upvotes: 0

glee8e
glee8e

Reputation: 6419

Your factorial part is not a constant time operation, but O(n). This means your first thread will have significantly less work than the last thread. Therefore you are not distributing work evenly.

There are generally three methods to tackle this.

You can make uneven step, i.e. larger step for smaller k. This is highly inefficient though, as you are making same multiplication for thousands of times.

You can try to switch to an approximate algorithm to calculate factorial to bring it to constant time. For small k you can use iteration to prevent loss of precision though, as the penalty will be low and there aren't many small k anyway.

Another way is to build a big array holding all factorial that may be used in calculation, which must be run before you submit any task. This caching method lose less precision. See comment below on how to parallelize this process.

Upvotes: 1

Related Questions