ferdidddb
ferdidddb

Reputation: 3

airflow operator to use gcloud beta dataproc commands

Does anyone know if it exists an Airflow operator that could do what do the gcloud beta commands? I'm trying to launch a Spark job on a GKE cluster. The gcloud beta commands works, but it is not the case using DataprocSparkOperator.

With this operator, the job keeps turning but the driver pod is not instantiated, but it works doing the gcloud command referenced here : https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke

Upvotes: 0

Views: 725

Answers (1)

Domin
Domin

Reputation: 1145

To be completely honest, I believe that Airflow is not intended to run gcloud commands. If there is no operator, that you can use it's better to use Google API in conduction with PythonOperator.

If you really want to use gcloud commands, you'll need to install gcloud SDK in your Airflow instance: https://cloud.google.com/sdk/docs/downloads-interactive#silent . It's quite heavy, so if you have Airflow as a Service it will take longer to deploy it.

After all you'll need to authorize - there is service-account way which might be optimal for you: https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account .
You'll have to put service-account in some safe place, e.g HDFS (if you have a cluster). For local purposes it can be stored locally.

If you're done with authorization just use BashOperator to do what you want - you have gcloud in your Airflow installed.

Upvotes: 0

Related Questions