Reputation: 73
I have:
I want to run one of the Hadoop utils on master node (hadoop distcp
) programatically. What is the best way to do that? So far I have the next clue: ssh to master node and run util from there. Is there any other option to accomplish the same goal?
Upvotes: 4
Views: 764
Reputation: 4457
To run DistCp you can submit regular Hadoop MR job through Dataproc API or gcloud and specify org.apache.hadoop.tools.DistCp
as a main class:
gcloud dataproc jobs submit hadoop --cluster=<CLUSTER> \
--class=org.apache.hadoop.tools.DistCp -- <SRC> <DST>
From Python you can use either REST API directly or Python Client library to submit DistCp job.
Upvotes: 4