Abhijit Mehetre
Abhijit Mehetre

Reputation: 53

How to pass Spark job properties to DataProcSparkOperator in Airflow?

I am trying to execute Spark jar on Dataproc using Airflow's DataProcSparkOperator. The jar is located on GCS, and I am creating Dataproc cluster on the fly and then executing this jar on the newly created Dataproc cluster.

I am able to execute this with DataProcSparkOperator of Airflow with default settings, but I am not able to configure Spark job properties (e.g. --master, --deploy-mode, --driver-memory etc.). From documentation of airflow didn't got any help. Also tried many things but didn't worked out. Help is appreciated.

Upvotes: 4

Views: 4021

Answers (1)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4457

To configure Spark job through DataProcSparkOperator you need to use dataproc_spark_properties parameter.

For example, you can set deployMode like this:

DataProcSparkOperator(
    dataproc_spark_properties={ 'spark.submit.deployMode': 'cluster' })

In this answer you can find more details.

Upvotes: 7

Related Questions