Yuan Liang
Yuan Liang

Reputation: 11

Spark over Yarn some tasks are extremely slower

I am using a cluster of 12 virtual machines, each of which has 16 GB memory and 6 cores(except master node with only 2 cores). To each worker node, 12GB memory and 4 cores were assigned.

When I submit a spark application to yarn, I set the number of executors to 10(1 as master manager, 1 as application master), and to maximize the parallelism of my application, most of my RDDs have 40 partitions as same as the number of cores of all executors.

The following is the problem I encountered: in some random stages, some tasks need to be processed extremely longer than others, which results in poor parallelism. As we can see in the first picture, executor 9 executed its tasks over 30s while other tasks could be finished with 1s. Furthermore, the reason for much time consumed is also randomized, sometimes just because of computation, but sometimes scheduler delay, deserialization or shuffle read. As we can see, the reason for second picture is different from first picture.

I am guessing the reason for this occurs is once some task got assigned to a specific slot, there is not enough resources on the corresponding machine, so jvm was waiting for cpus. Is my guess correct? And how to set the configuration of my cluster to avoid this situation?

computing scheduler delay & deserialization

Upvotes: 1

Views: 1120

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

To get a specific answer you need to share more about what you're doing but most likely the partitions you get in one or more of your stages are unbalanced - i.e. some are much bigger than others. The result is slowdown since these partitions are handled by a specific task. One way to solve it is to increase the number of partitions or change the partitioning logic When a big task finishes shipping the data to other tasks would take longer as well so that's why other tasks may take long

Upvotes: 1

Related Questions