lu_ferra
lu_ferra

Reputation: 83

Kafka and apache Spark streaming cluster configuration

I need to run some Spark scala scripts on a cluster of machines. Data are generated by an Apache Kafka producer running on 1 of these machines.

I have already configured the slaves.template file in the conf directory in Apache Spark with the URL of every node of the cluster.

I run the scripts with this instruction: ./bin/spark-submit --class com.unimi.lucaf.App /Users/lucaferrari/scala-spark-script2/target/scala-spark-script-1.0.jar but it seemes that it is only running on the master node.

How can i fix it?

Thanks

SOLVED

  1. In folder conf renamed the slaves.template file to slaves and added the URL of every worker
  2. In folder conf renamed the spark-env.sh.templatefile to spark-env.sh and added these lines:

    SPARK_MASTER_HOST=1.2.3.4

    SPARK_MASTER_PORT=7077

    SPARK_MASTER_WEBUI_PORT=4444

    SPARK_WORKER_WEBUI_PORT=8081

  3. In folder sbin on the master machine I run the start-master.sh script.
  4. On every worker, in folder sbin I run start-slave.sh spark://master-url:master-port. master-url and master-port must be the same configured in the spark-env.sh file.
  5. On the spark configuration of the script I've added also the master-url (val sparkConf = new SparkConf().setAppName("SparkScript").setMaster("spark://master-url:master-port")
  6. Run the script with ./bin/spark-submit --class com.unimi.lucaf.App /home/spark1/scala-spark-script2/target/scala-spark-script-1.0.jar --master spark://master-url:master-port --deploy-mode cluster

Upvotes: 1

Views: 652

Answers (1)

Gus B
Gus B

Reputation: 106

Have you tried to add the

--master <master_url>

option? If you omit this option to spark-submit, it will run locally.

You may also check Spark's documentation on spark-submit options: https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit

Upvotes: 1

Related Questions