Reputation: 2628

Hadoop: specify yarn queue for distcp

On our cluster we have set up dynamic resource pools.

The rules are set so that first yarn will look at the specified queue, then to the username, then to primary group ...

However with a distcp I can't seem to be able to specify a queue, it just sets it to the primary group.

This is how I run it now (which doesn't work):

 hadoop distcp -Dmapred.job.queue.name:root.default .......

Upvotes: 11

Answers (3)

Th.

Reputation: 21

Similarly, hadoop archive can be instructed to target a custom queue :

hadoop archive -Dmapreduce.job.queuename='<leaf.queue.name> ...

I take the opporunity of this response to give a tip for hadoop archive: as it will create one map task per file to create (by default, the destination file size is 2GB). This can lead to thousands of maps when archiving terabytes of data.

The size of part-* files of hadoop archives is controlled with undocumented har.partfile.size : you can increase it by setting a value (in bytes) higher than 2GiB with -Dhar.partfile.size=<value in bytes>

Upvotes: 2

Manjunath Ballur

Reputation: 6343

You are committing a mistake in the specification of the parameter.

You should not use ":" for separating the key/value pairs. You should use "=".

The command should be

 hadoop distcp -Dmapred.job.queue.name=root.default .......

Upvotes: 28

facha

Reputation: 12502

-Dmapreduce.job.queuename=root.default

Upvotes: 10

Hadoop: specify yarn queue for distcp

Answers (3)

Related Questions