Why using scala parallelism have slower performance in this case?

Question

The class TestClassString returns a java.util.List of Strings

The object TestViewPerformance records the time taken to call method TestViewController.iterateList.

Within iterateList the time taken to run this small program is consistently at least 100ms faster when parallelism is removed :

mySeq.par to mySeq

I realise there is benchmarking tool used for measuring scala performance as specified here : http://docs.scala-lang.org/overviews/parallel-collections/performance.html

But still I would expect this program to run faster using parallelism based on current millisecond time ? Is all code within the .par loop spread over multiple cores ?

Here is the entire code :

package testpackage

import java.util.Calendar

object TestViewPerformance {

  def main(args:Array[String]) = {

      val before = Calendar.getInstance().getTimeInMillis()

      val testViewController = new TestViewController();
      val testClassString : TestClassString = new TestClassString()

      val folderList = testClassString.getStringList()
      var buffer = new scala.collection.mutable.ListBuffer[String]
      val seq = scala.collection.JavaConversions.asScalaBuffer(folderList);

      /*
       * this method (iterateList) is where the parallelism occurs
       */
      testViewController.iterateList(seq)

      val after = Calendar.getInstance().getTimeInMillis()

      println(before)
      println(after)
      println(after-before)

  }

  class TestViewController {

      def iterateList(mySeq : Seq[String]) = {

        for (seqVal<- mySeq) {
            if(seqVal.equalsIgnoreCase("test")){            

            }
        }   
}

}

}

package testpackage;

import java.util.ArrayList;
import java.util.List;

public class TestClassString {

    public List getStringList(){

        List l = new ArrayList();

        for(int i = 0; i < 1000000; ++i){
            String test = ""+Math.random();
            l.add(test);
        } 

        return l;
    }

}

JB Nizet · Accepted Answer

It's probably because most of the time in each iteration is spent printing to System.out, which is a synchronized operation that is thus not parallelizable. So the cost induced by starting threads, scheduling them and synchronize them makes the parallel iteration slower than the sequential one.

Why using scala parallelism have slower performance in this case?

Answers (2)

Related Questions