Reputation: 333
Is there any direct way to run shell scripts into dataproc cluster. currently i can run the shells through pysparkoperator (which calls aonther python file and then this python file calls shell script). I have searched many links but as of now not found any direct way .
It will be really helpful for me if anybody can tell me the easiest way.
Upvotes: 1
Views: 2384
Reputation: 591
You can use airflow BashOperator and use the following command:
gcloud compute ssh user@server --zone your_cluster_zone \
--command='Your Command'
Example:
BashCommand= BashOperator(
task_id='BashCommand',
bash_command='gcloud compute ssh user@server --zone your_cluster_zone --command='Your Command',
dag=dag)
Upvotes: 0
Reputation: 2158
PIG job with sh
operator [1]: gcloud dataproc jobs submit pig ... -e 'sh ls'
I am however curious what the end goal is? Why run shell scripts? If your intent is to perform one-time cluster setup then you should use initialization actions [2].
[1] https://pig.apache.org/docs/r0.9.1/cmds.html#sh
[2] https://cloud.google.com/dataproc/docs/concepts/init-actions
Upvotes: 1