Grant
Grant

Reputation: 520

Spark Exception in thread "main" java.net.BindException: Failed to bind to: /10.3.100.169:0

When I submitted an application to standalone cluster, i met this exception.

What is weird is it comes and goes several time. I already set SPARK_LOCAL_IP to the right ip address.

But I don't understand why the work always access to port 0

The environment is :

vm1: 10.3.100.169, running master and slave

vm2: 10.3.101.119, running slave

Anyone met this issue? Any ideas about how to solve?

Here is the command line and spark-env.sh

bin/spark-submit --master spark://10.3.100.169:7077 --deploy-mode cluster --class ${classname} --driver-java-options "-Danalytics.app.configuration.url=http://10.3.100.169:9090/application.conf -XX:+UseG1GC" --conf "spark.executor.extraJavaOptions=-Danalytics.app.configuration.url=http://10.3.100.169:9090/application.conf -XX:+UseG1GC" ${jar}

SPARK_LOCAL_IP=10.3.100.169
SPARK_MASTER_IP=10.3.100.169
SPARK_PUBLIC_DNS=10.3.100.169
SPARK_EXECUTOR_MEMORY=3g
SPARK_EXECUTOR_CORES=2
SPARK_WORKER_MEMORY=3g
SPARK_WORKER_CORES=2

Thanks

Upvotes: 1

Views: 361

Answers (1)

Clyde D'Cruz
Clyde D'Cruz

Reputation: 2065

If we consider a fresh installation of Spark with its default configuration, the following steps should create a working Spark Standalone cluster.

1. Configure /etc/hosts file on master and slaves

Your hosts file on both nodes should look like

127.0.0.1 localhost
10.3.100.169 master.example.com master
10.3.101.119 slave.example.com slave

2. Setup password-less SSH between master and workers

On the master execute the following commands

# change to the user you are going to use to run Spark eg. 'spark-user'
su - spark-user    
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub spark-user@slave
ssh-copy-id -i ~/.ssh/id_rsa.pub spark-user@master #(since you want to start a worker on master too)

verify that you are able to SSH to slave from master without a password

refer: setup passwordless ssh
3. configure conf/slaves file on all nodes

Your slaves file should look like:

master.example.com
slave.example.com

4. Start the cluster

sbin/start-all.sh

Hope this helps !

Upvotes: 1

Related Questions