Reputation: 6894
i'm trying to usage parallel collections in a very basic way via .par - i expect the collection to be acted on out of order, but that doesn't seem the case:
scala> (1 to 10) map println
1
2
3
4
5
6
7
8
9
10
and
scala> (1 to 10).par map println
1
2
3
4
5
6
7
8
9
10
seems like the order shouldn't be sequential in the latter case. this is with scala 2.9, my machine has 2 cores. is this perhaps a misconfiguration somewhere? thanks!
edit: i did indeed try running with a large set (100k) and the result was still sequential.
Upvotes: 7
Views: 1921
Reputation: 41646
YMMV:
scala> (1 to 10).par map println
1
6
2
3
4
7
5
8
9
This is on a dual core too...
I think if you try enough run you may see different results. Here is a piece of code that shows some of what happens:
import collection.parallel._
import collection.parallel.immutable._
class ParRangeEx(range: Range) extends ParRange(range) {
// Some minimal number of elements after which this collection
// should be handled sequentially by different processors.
override def threshold(sz: Int, p:Int) = {
val res = super.threshold(sz, p)
printf("threshold(%d, %d) returned %d\n", sz, p, res)
res
}
override def splitter = {
new ParRangeIterator(range)
with SignalContextPassingIterator[ParRangeIterator] {
override def split: Seq[ParRangeIterator] = {
val res = super.split
println("split " + res) // probably doesn't show further splits
res
}
}
}
}
new ParRangeEx((1 to 10)).par map println
Some runs I get interspersed processing, some runs I get sequential processing. It seems to split the load in two. If you change the returned threshold number to 11, you'll see that the workload will never be split.
The underlying scheduling mechanism is based on fork-join and work stealing. See the following JSR166 source code for some insights. This is probably what drives whether the same thread will pick up both tasks (and thus seems sequential) or two threads work on each task.
Here is an example output on my computer:
threshold(10, 2) returned 1
split List(ParRangeIterator(over: Range(1, 2, 3, 4, 5)),
ParRangeIterator(over: Range(6, 7, 8, 9, 10)))
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
threshold(10, 2) returned 1
6
7
threshold(10, 2) returned 1
8
1
9
2
10
3
4
5
Upvotes: 11
Reputation: 12631
The answer can very well come out sequentially; there just is no guarantee for it. On such a small set you would normally get it sequentially. The println however invokes a system call, if you run it enough times you probably will get a jumbled version.
Upvotes: 1