juancito
juancito

Reputation: 45

increase user capacity in hadoop capacity scheduler

I'm new to Hadoop. After coding my MapReduce Jobs I decided to test them on a shared cluster. I initally tested my work on a single node. But then later I added 4 nodes to test it on 5 (1+4). The capacity scheduler shows the following information:

Queue configuration
Capacity Percentage: 100.0%
User Limit: 100%
Priority Supported: NO

Map tasks
Capacity: 10 slots
Used capacity: 2 (20.0% of Capacity)
Running tasks: 2
Active users:
User 'juancito': 2 (100.0% of used capacity)

Because with 1 node I had 2 slots, and now with 5 nodes I have 10 slots, I guess each node has two slots (correct me if I'm wrong). Now, the scheduler says that I'm using only 20% of the capacity. Does this mean I'm not actually using the 4 nodes I have added? Does the number of slots affect the performance of my running Jobs? Is there a way to know if parallelization is actually taking place? If am not using the 4 nodes I added, how do I increase the capacity for user 'juancito' (myself) from 2 to 10 so that he can enjoy the full mapping capacity of the 5 nodes? Thanks.

Upvotes: 1

Views: 219

Answers (1)

cabad
cabad

Reputation: 4575

You are only using 20% of the capacity because you are only using 2 out of the 10 slots. The reason for this is that your job only requires two map tasks. Do you have only two input files (or one input file that is large enough to be divided into 2 splits)?

Just because you have more capacity, this does not mean that your job actually needs that extra capacity. You could, however, run more jobs at the same time and be able to make a better use of your cluster resources.

Upvotes: 1

Related Questions