user1745995
user1745995

Reputation: 151

Spark application uses only 1 executor

I am running an application with the following code. I don't understand why only 1 executor is in use even though I have 3. When I try to increase the range, my job fails cause the task manager loses executor. In the summary, I see a value for shuffle writes but shuffle reads are 0 (maybe cause all the data is on one node and no shuffle read needs to happen to complete the job).

val rdd: RDD[(Int, Int)] = sc.parallelize((1 to 10000000).map(k => (k -> 1)).toSeq)
val rdd2= rdd.sortByKeyWithPartition(partitioner = partitioner)
val sorted = rdd2.map((_._1))
val count_sorted = sorted.collect()

Edit: I increased the executor and driver memory and cores. I also changed the number of executors to 1 from 4. That seems to have helped. I now see shuffle read/writes on each node.

Upvotes: 3

Views: 6656

Answers (2)

gsamaras
gsamaras

Reputation: 73444

..maybe cause all the data is on one node

That should make you think that your RDD has only one partition, instead of 3, or more, that would eventually utilize all the executors.

So, extending on Hokam's answer, here's what I would do:

rdd.getNumPartitions

Now if that is 1, then repartition your RDD, like this:

rdd = rdd.repartition(3) 

which will partition your RDD into 3 partitions.

Try executing your code again now.

Upvotes: 3

Hokam
Hokam

Reputation: 924

It looks like your code is ending up with only one partition for RDD. You should increase the partitions of RDD to at least 3 to utilize all 3 executors.

Upvotes: 4

Related Questions