Yash Agrawal
Yash Agrawal

Reputation: 11

Connecting to Spark Standalone cluster from Airflow

I've airflow running on local env using docker-compose file and spark standalone cluster also running on local. I logged into airflow worker container and tried to submit the spark job to standalone spark cluster but connection to master node is getting refused.

  1. airflow is running on localhost:8080
  2. spark standalone cluster is running on localhost:8090
  3. Spark Master at spark://spark-master:7077

NOTE: I checked that JAVA_HOME path is properly set, both the airflow and spark standalone containers are running on same docker network but still I'm unable to submit the job.

command: spark-submit --master spark://spark-master:7077 ./dags/my-script.py

I tried all combination of --master value but no luck.

Can anyone suggest if I'm missing something

Upvotes: 1

Views: 407

Answers (1)

Samat Kurmanov
Samat Kurmanov

Reputation: 42

As I remember you can not use SparkSubmitOperator with a standalone cluster. You could try SSHBashOperator and run spark-submit in spark machine.

Upvotes: 0

Related Questions