Reputation: 13
import java.util.concurrent.{Executors, TimeUnit}
import scala.annotation.tailrec
import scala.concurrent.{Await, ExecutionContext, Future}
import scala.util.{Failure, Success}
object Fact extends App {
def time[R](block: => R): Long = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
val t: Long = TimeUnit.SECONDS.convert((t1 - t0), TimeUnit.NANOSECONDS)
//println(
// "Time taken seconds: " + t)
t
}
def factorial(n: BigInt): BigInt = {
@tailrec
def process(n: BigInt, acc: BigInt): BigInt = {
//println(acc)
if (n <= 0) acc
else process(n - 1, n * acc)
}
process(n, 1)
}
implicit val ec =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(2))
val f1: Future[Stream[Long]] =
Future.sequence(
(1 to 50).toStream.map(x => Future { time(factorial(100000)) }))
f1.onComplete {
case Success(s) => {
println("Success : " + s.foldLeft(0L)(_ + _) + " seconds!")
}
case Failure(f) => println("Fails " + f)
}
import scala.concurrent.duration._
Await.ready(Future { 10 }, 10000 minutes)
}
I have the above Factorial code that needs to use multiple cores to complete the program faster and should utilize more cores.
So I change,
Executors.newFixedThreadPool(1) to utilize 1 core
Executors.newFixedThreadPool(2) to utilize 2 cores
When changing to 1 core, then result appears in 127 seconds. But when changing to 2 cores, then I get 157 seconds.
My doubt is, when i increase cores(parallelism) then it should give good performance. But it is not. Why?
Please correct me, if I am wrong or missing something.
Thanks in Advance.
Upvotes: 0
Views: 641
Reputation: 23788
First of all, Dima is right that what you print is total execution time of all tasks rather than total time till the last task is finished. The difference is that the first sums time for all the work done in parallel and only the latter shows actual speed up from multi-threading.
However there is another important effect. When I run this code with 1, 2 and 3 threads and measure both total time (time until f1
is ready) and total parallel time (the one that you print), I get following data (I also reduce number of calculations from 50 to 20 to speed up my tests):
1 - 70 - 70
2 - 47 - 94
3 - 43 - 126
At the first glance it looks OK as the parallel time divided by the real time is about the same as the number of threads. But if you look a bit closer, you may notice that speed up going from 1 thread to 2 is only about 1.5x and only 1.1x for the third thread. Also these figures mean that total time of all tasks actually goes up when you add threads. This might seem puzzling.
The answer to this puzzle is that your calculation is actually not CPU-bound. The thing is that the answer (factorial(100000)
) is actually a pretty big number. In fact it is so big that it takes about 185KB of memory to store it. What this means is that at the latter stages of computation your factorial
method actually becomes more memory-bound than CPU-bound because this size is big enough to overflow the fastest CPU caches. And this is the reason why adding more threads slows down each calculation: yes, you do calculation faster but memory doesn't get any faster. So when you saturate CPU caches and then memory channel, adding more threads (cores) doesn't improves performance that much.
Upvotes: 1
Reputation: 40510
How are you measuring the time? The result you are printing out is not the time execution took, but the sum of times of each individual call.
Running Fact.time(Fact.main(Array.empty))
in REPL I get 90 and 178 with two and one threads respectively. Seems to make sense ...
Upvotes: 2