Kafka and apache Spark streaming cluster configuration

Question

I need to run some Spark scala scripts on a cluster of machines. Data are generated by an Apache Kafka producer running on 1 of these machines.

I have already configured the slaves.template file in the conf directory in Apache Spark with the URL of every node of the cluster.

I run the scripts with this instruction: ./bin/spark-submit --class com.unimi.lucaf.App /Users/lucaferrari/scala-spark-script2/target/scala-spark-script-1.0.jar but it seemes that it is only running on the master node.

How can i fix it?

Thanks

SOLVED

In folder conf renamed the slaves.template file to slaves and added the URL of every worker
In folder conf renamed the spark-env.sh.templatefile to spark-env.sh and added these lines:

SPARK_MASTER_HOST=1.2.3.4

SPARK_MASTER_PORT=7077

SPARK_MASTER_WEBUI_PORT=4444

SPARK_WORKER_WEBUI_PORT=8081
In folder sbin on the master machine I run the start-master.sh script.
On every worker, in folder sbin I run start-slave.sh spark://master-url:master-port. master-url and master-port must be the same configured in the spark-env.sh file.
On the spark configuration of the script I've added also the master-url (val sparkConf = new SparkConf().setAppName("SparkScript").setMaster("spark://master-url:master-port")
Run the script with ./bin/spark-submit --class com.unimi.lucaf.App /home/spark1/scala-spark-script2/target/scala-spark-script-1.0.jar --master spark://master-url:master-port --deploy-mode cluster

Kafka and apache Spark streaming cluster configuration

Answers (1)

Related Questions