James Liu
James Liu

Reputation: 206

How can I list active DISTCP jobs?

How can I list active DISTCP jobs?

I'm running a distcp job between two clusters. hadoop distcp hdfs://x/y /x/y

I want to run this continually but need to make sure existing distcp tasks are complete.

I've tried the following on both source and destination clusters, but I cannot see the copy operation. mapred job -list all

Upvotes: 0

Views: 603

Answers (1)

tk421
tk421

Reputation: 5947

This is basically a variation on Yarn api get applications by elapsedTime. In your case you can use the RM Cluster Applications API to get all the apps (unfortunately it doesn't filter on name), then filter the apps where name equals distcp. The following shows how to filter using jq:

$ curl 'RMURL/ws/v1/cluster/apps' | jq '.apps.app[] | select (.name == "distcp")'

For your case, if you're only interested in active jobs you would add the states filter to the API call.

$ curl 'RMURL/ws/v1/cluster/apps?states=NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING' |\
    jq '.apps.app[] | select (.name == "distcp")'

http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API

Upvotes: 1

Related Questions