Reputation: 3
Does anyone know if it exists an Airflow operator that could do what do the gcloud beta commands?
I'm trying to launch a Spark job on a GKE cluster. The gcloud beta commands works, but it is not the case using DataprocSparkOperator
.
With this operator, the job keeps turning but the driver pod is not instantiated, but it works doing the gcloud command referenced here : https://cloud.google.com/dataproc/docs/concepts/jobs/dataproc-gke
Upvotes: 0
Views: 725
Reputation: 1145
To be completely honest, I believe that Airflow is not intended to run gcloud
commands. If there is no operator, that you can use it's better to use Google API in conduction with PythonOperator
.
If you really want to use gcloud
commands, you'll need to install gcloud
SDK in your Airflow instance:
https://cloud.google.com/sdk/docs/downloads-interactive#silent . It's quite heavy, so if you have Airflow as a Service it will take longer to deploy it.
After all you'll need to authorize - there is service-account
way which might be optimal for you: https://cloud.google.com/sdk/gcloud/reference/auth/activate-service-account .
You'll have to put service-account
in some safe place, e.g HDFS
(if you have a cluster). For local purposes it can be stored locally.
If you're done with authorization just use BashOperator
to do what you want - you have gcloud
in your Airflow installed.
Upvotes: 0