soupybionics
soupybionics

Reputation: 4386

spark-shell declining offers from mesos master

I have been trying to learn spark on mesos, but the spark-shell just keeps on ignoring the offers. Here is my setup:

All the components are in the same subnet

Once the spark-shell is up, I run the simplest program possible

val f = sc.textFile ("/tmp/ok.txt");
f.count()

.. and I keep getting the following logs on spark-shell

 (0 + 0) / 2]17/05/21 15:13:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/05/21 15:13:49 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/05/21 15:14:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Master side logs: (these logs I see even before doing anything inside spark-shell and they keep coming even after I have run the above code in the spark shell)

I0521 15:14:12.949108 10166 master.cpp:6992] Sending 2 offers to framework 64c1ef67-9e4f-4236-bb86-80d7aaab540f-0000 (Spark shell) at [email protected]:45596
I0521 15:14:12.955731 10164 master.cpp:4731] Processing DECLINE call for offers: [ 64c1ef67-9e4f-4236-bb86-80d7aaab540f-O34 ] for framework 64c1ef67-9e4f-4236-bb86-80d7aaab540f-0000 (Spark shell) at [email protected]:45596
I0521 15:14:12.956130 10167 master.cpp:4731] Processing DECLINE call for offers: [ 64c1ef67-9e4f-4236-bb86-80d7aaab540f-O35 ] for framework 64c1ef67-9e4f-4236-bb86-80d7aaab540f-0000 (Spark shell) at [email protected]:45596

I am using Mesos 1.2.0 and spark 2.1.1 on Ubuntu 16.04. I have verified by writing a small node.js based http client and the offers from the master seem fine. What possibly is going wrong here?

Upvotes: 0

Views: 836

Answers (1)

soupybionics
soupybionics

Reputation: 4386

OK, There were two problems here.

  1. The SPARK_EXECUTOR_URI was local, so changed it to http. local I guess is for hadoop (correct me here incase).

  2. After changing the URI to local , the netty blockmanager service that runs as a part of spark executor launched by mesos-executor (as a task, coarse mode) which is launched by Mesos Containerizer which is launched by mesos-agent, used to fail trying to bind to the public IP because I had passed the hostname as the public IP to the mesos-agent, which is bound to fail in EC2. In fact, I was passing private IP at first but don't remember why I changed the hostname to public IP. Probably, to check for the sandbox logs I guess. The Mesos master was redirecting it to the mesos-agent's private IP preventing me to see the stderr logs. (I am located outside the EC2 VPC). Note, the question above has private IP being passed to the agent, which is correct. Originally, The question above was posted for the first problem.

Upvotes: 0

Related Questions