AlonL
AlonL

Reputation: 6720

Spark on Mesos is much slower than local

I'm running a Spark Streaming process on a 16 CPU's 64 GB RAM host with Mesos.

When I'm running it using Mesos as a cluster manager (by setting --master mesos://leader.mesos:5050) it's running much slower than when it is run in local mode (--master local[4]).

I can't find the reason for that and I have no clue. One of the things I've noticed is that there is one specific task that is taking significantly more time on Mesos than in Local.

The weird thing (maybe that should be the questions' title) is that the task itself takes 6s and its stage (it has only one stage) takes less than a second. See attached pictures (Mesos (1) and (2)). How come? Isn't a job equal to the sum of its parts?

Local: Spark - Local

Mesos:

(1) Spark - Mesos Stages

(2) Spark - Mesos Job

Another note: I did manage to run this exact same Spark Streaming process on another Mesos cluster, and it runs in a sensible amount of time, pretty much like in the local mode described above. The only difference that I can think of is that this cluster has more than one host, and that Spark is running with 2 executors rather than 1. (I couldn't find a way to run more than 1 executor on the same host on Mesos). Is this may be the reason?

Any clues would be much appreciated.

Upvotes: 1

Views: 410

Answers (1)

Tombart
Tombart

Reputation: 32476

Spark can run over Mesos in two modes: coarse-grained (default) and fine-grained (see documentation).

In coarse-grained mode Spark launches exactly one executor on each machine it was assigned to by Mesos. Inside this task Spark launches other mini-tasks. It has the benefit of lower startup overhead (in your case you don't want to change this mode).

Could you be more specific about your streaming job? Is it CPU, disk, or network bounded? You can easily compare performance if you run some of Spark examples.

If your task is CPU intensive you might consider setting spark.mesos.extra.cores. By default Spark tries to acquire all cores that are being offered by Mesos. So, if there's no other task running on that cluster it shouldn't be a problem.

Upvotes: 1

Related Questions