Reputation: 5184
This takes around 1 second
(1 to 1000000).map(_+3)
While this gives java.lang.OutOfMemoryError: Java heap space
(1 to 1000000).par.map(_+3)
EDIT:
I have standard scala 2.9.2 configuration. I am typing this on scala prompt. And in the bash i can see [ -n "$JAVA_OPTS" ] || JAVA_OPTS="-Xmx256M -Xms32M"
AND i dont have JAVA_OPTS set in my env.
1 million integers = 8MB, creating list twice = 16MB
Upvotes: 5
Views: 994
Reputation: 5787
I had the same, but using a ThreadPool seems to get rid of the problem for me:
val threadPool = Executors.newFixedThreadPool(4)
val quadsMinPar = quadsMin.par
quadsMinPar.tasksupport = new ThreadPoolTaskSupport(threadPool.asInstanceOf[ThreadPoolExecutor])
ForkJoin for large collections might be creating too many threads.
Upvotes: 0
Reputation: 32335
Several reasons for the failure:
map
means that the range is converted into a vector. For parallel vectors an efficient concatenation has not been implemented yet, so merging intermediate vectors produced by different processors proceeds by copying - requiring more memory. This will be addressed in future releases.Upvotes: 3
Reputation: 61695
There are two issues here, the amount of memory required to store a parallel collection and the amount of memory required to 'pass through' a parallel collection.
The difference can be seen between these two lines:
(1 to 1000000).map(_+3).toList
(1 to 1000000).par.map(_+3).toList
The REPL stores the evaluated expressions, remember. On my REPL, I can execute both of these 7 times before I run out of memory. Passing via the parallel executions uses extra memory temporarily, but once the toList is executed, that extra usage is garbage collected.
(1 to 100000).par.map(_+3)
returns a ParSeq[Int] (in this case a ParVector), which takes up more space than a normal Vector. This one I can execute 4 times before I run out of memory, whereas I can execute this:
(1 to 100000).map(_+3)
11 times before I run out of memory. So parallel collections, if you keep them around will take up more space.
As a workaround, you can transform them into simpler collections like a List
before you return them.
As for why so much space is taken up by parallel collections and why it keeps references to so many things, I don't know, but I suspect views
[*], and if you think it's a problem, raise an issue for it.
[*] without any real evidence.
Upvotes: 2
Reputation: 24759
It seems definitely related to the JVM memory option and to the memory required to stock a Parralel collection. For example:
scala> (1 to 1000000).par.map(_+3)
ends up with a OutOfMemoryError
the third time I tried to evaluate it, while
scala> (1 to 1000000).par.map(_+3).seq
never failed. The issue is not the computation its the storage of the Parrallel collection.
Upvotes: 9