Adt
Adt

Reputation: 333

Airflow Dataproc operator to run shell scripts

Is there any direct way to run shell scripts into dataproc cluster. currently i can run the shells through pysparkoperator (which calls aonther python file and then this python file calls shell script). I have searched many links but as of now not found any direct way .

It will be really helpful for me if anybody can tell me the easiest way.

Upvotes: 1

Views: 2384

Answers (2)

Ashish Kumar
Ashish Kumar

Reputation: 591

You can use airflow BashOperator and use the following command:

gcloud compute ssh user@server --zone your_cluster_zone \
  --command='Your Command'

Example:

    BashCommand= BashOperator(
    task_id='BashCommand',
    bash_command='gcloud compute ssh user@server --zone your_cluster_zone --command='Your Command',
    dag=dag)

Upvotes: 0

tix
tix

Reputation: 2158

PIG job with sh operator [1]: gcloud dataproc jobs submit pig ... -e 'sh ls'

I am however curious what the end goal is? Why run shell scripts? If your intent is to perform one-time cluster setup then you should use initialization actions [2].

[1] https://pig.apache.org/docs/r0.9.1/cmds.html#sh

[2] https://cloud.google.com/dataproc/docs/concepts/init-actions

Upvotes: 1

Related Questions