Reputation: 107
I am a begginer in Spark and I am a bit confuse about the behaviour of Spark.
I am developing an algorithm in Scala, in this method I create an RDD with a number of partitions specified by the user in this way:
val fichero = sc.textFile(file, numPartitions)
I am developing under a cluster with 12 workers and 216 cores available (18 per node). But when I go to the Spark UI to debug the application I saw the following event timeline for a given stage:
Sorry for the quality of the image, but I have to low the zoom a lot. In this execution, there are 128 partitions. But, as can be observed in the image, the whole RDD is executed in only two out of twelve executors available, so some task are executed sequentially and I don't want that behaviour.
So the question is: What's happening here? Could I use all workers in order to execute each task in parallel? I have seen the option:
spark.default.parallelism
But this option is modified when choosing the number of partition to use. I am launching the application with the defaults parameters of the spark-submit script.
Upvotes: 3
Views: 3276
Reputation: 1407
numPartition is a hint not a requirement. It is finally passed to InputFormat https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/FileInputFormat.html#getSplits(org.apache.hadoop.mapred.JobConf, int) You can always check the actual number of partitions with
val fichero = sc.textFile(file, numPartitions)
fichero.partitions.size
Upvotes: 1
Reputation: 27373
You should set --num-executors
to a higher number (default is 2), you should also look at --executor-cores
which is 1 by default. Try e.g. --num-executors 128
.
Make sure that your number of partitions is a multiple (I normally use 2 or 4, depending on the resources needed) of "the number of executors times the number of cores per executor".
See spark-submit --help
and for further reading, I can recommend to have a look at this (especially "tuning parallelism") : http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
Upvotes: 2