Reputation: 181
I am trying to calculate e=∑(3−4k^2/(2k+1)!); k=0..10000
However I got stuck and can't get a desired performance boost using multithreading.
Given a number of threads, I tried to divide the whole sum to k / numberOfThreads
chunks and submit futures for each partial sum.
I think the bad part might be the factorial calculation or the granularity. I tried with a smaller step, but didn't get a big improvement. Maybe a different approach is needed.
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
int step = k / numberOfThreads ;
BigDecimal result = BigDecimal.ZERO;
for (int j = 0; j <= k; j += step) {
Future<BigDecimal> future = executor.submit(new EulerCalculator(j, j + step));
futures.add(future);
}
for (Future<BigDecimal> future : futures) {
result = result.add(future.get());
}
public class EulerCalculator implements Callable<BigDecimal> {
private int start;
private int end;
public BigDecimal call() {
long numerator = 3 - 4 * start * start;
BigDecimal denominator = factorial(2 * start + 1);
BigDecimal partialSum = BigDecimal.valueOf(numerator)
.divide(denominator, 1000, RoundingMode.HALF_EVEN);
for (int i = start + 1 ; i < end; i++) {
numerator = 3 - 4 * i * i;
denominator = denominator.multiply(BigDecimal.valueOf(2 * i * (2*i + 1)));
partialSum = partialSum.add(BigDecimal.valueOf(numerator)
.divide(fact, 1000, RoundingMode.HALF_EVEN));
}
return partialSum;
}
private BigDecimal factorial(int cur) {
BigDecimal fact = BigDecimal.ONE;
for (int i = 2; i <= cur; i++) {
fact = fact.multiply(BigDecimal.valueOf(i));
}
return fact;
}
}
Best results from a few runs on a quad-core:
k = 10000
threads = 1: 345ms
threads = 2: 216ms
threads = 4: 184ms
threads = 8: 225ms
Upvotes: 2
Views: 917
Reputation: 2720
Since you need all the denominator
s and each one depends on ALL previous, I would have a single dedicated thread to compute all of them; and for each denominator
computed submit a different task to your thread pool to compute the particular partial sum in parallel. Finally aggregate all results using a parallel stream. The following code show these details:
public static BigDecimal calculate(int k, int numberOfThreads) {
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
BigDecimal denominator = BigDecimal.ONE;
for (int j = 1; j <= k; j++) {
denominator = denominator.multiply(BigDecimal.valueOf(4 * j * j + 2 * j));
Future<BigDecimal> future = executor.submit(computePartialSum(j, denominator));
futures.add(future);
}
return futures.stream().parallel()
.map(future.get())
.reduce(BigDecimal.ZERO, BigDecimal::add).add(BigDecimal.valueOf(3));
}
public static Callable<BigDecimal> computePartialSum(int curr, BigDecimal denominator) {
return () -> {
long numerator = 3 - 4 * curr * curr;
return BigDecimal.valueOf(numerator).divide(denominator, 1000, RoundingMode.HALF_EVEN);
};
}
Still, your bottleneck will be the computation of the factorials; which you can partition into smaller factorial segments and cache them to aggregate into their true values, my two cents.
Upvotes: 1
Reputation: 181
Thanks for the answers!
I cached the factorials with a simple for
loop and I got good results for the other calculation:
1 thread = 17ms
2 threads = 10ms
4 threads = 7ms
However I need to draw a diagram similar to the one below and that will be only possible if I exploit the threads for calculating factorial.
I tested this n!
algorithm:
public BigDecimal calculate(int number) {
if (number == 0 || number == 1) {
return BigDecimal.ONE;
}
List<Callable<BigDecimal>> callables = new ArrayList<>();
int step = number / processors;
for (int i = 2; i <= number; i += step + 1) {
callables.add(new FactorialPartCalculator(i, i + step >= number ? number : i + step));
}
List<Future<BigDecimal>> futures = executor.invokeAll(callables);
BigDecimal result = BigDecimal.ONE;
for (Future<BigDecimal> future : futures) {
result = result.multiply(future.get());
}
return result;
}
public class FactorialPartCalculator implements Callable<BigDecimal> {
@Override
public BigDecimal call() throws Exception {
BigDecimal factorialPart = BigDecimal.ONE;
for (int i = start; i <= end; i++) {
factorialPart = factorialPart.multiply(BigDecimal.valueOf(i));
}
return factorialPart;
}
I got 6.4x speedup with 6 threads for 20000!
.
So I need to cache the factorials and include the caching process in the overall time. The program will be tested on 32 processors and I should gain as much speedup as possible
So my question is how can I change the above algorithm to store all the factorials in an array? I only need the odd factorials if that can help.
Upvotes: 0
Reputation: 6419
Your factorial part is not a constant time operation, but O(n). This means your first thread will have significantly less work than the last thread. Therefore you are not distributing work evenly.
There are generally three methods to tackle this.
You can make uneven step, i.e. larger step for smaller k. This is highly inefficient though, as you are making same multiplication for thousands of times.
You can try to switch to an approximate algorithm to calculate factorial to bring it to constant time. For small k you can use iteration to prevent loss of precision though, as the penalty will be low and there aren't many small k anyway.
Another way is to build a big array holding all factorial that may be used in calculation, which must be run before you submit any task. This caching method lose less precision. See comment below on how to parallelize this process.
Upvotes: 1