Reputation: 65
I am running a Distcp in hadoop to load the data from dev cluster to production cluster .. my question is from where the resources will take.. is it from source or destination?
Upvotes: 1
Views: 503
Reputation: 1053
where ever you initiate the job/run the distCp command it will use the resources in that environment.
Side note: You can initiate the job in source or destination as long as you give the right source and destination.
Upvotes: 1
Reputation: 850
Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization.
Lets assume if you are copying from a Prod cluster to a Dev cluster, and are worried about resources utilization , then you can actually run the Distcp job on the Dev cluster and have it "pull" the data from Prod cluster.
Upvotes: 1