tsuda7
tsuda7

Reputation: 413

Explanation of YARN's DRF

I'm reading "Hadoop The Definitive Guide" of 4th edition, and came across this explanation for YARN'S DRF (in Chapter 4, Dominant Resource Fairness)

Imagine a cluster with a total of 100 CPUs and 10 TB of memory. Application A requests containers of (2 CPUs, 300 GB), and application B requests containers of (6 CPUs, 100 GB). A’s request is (2%, 3%) of the cluster, so memory is dominant since its proportion (3%) is larger than CPU’s (2%). B’s request is (6%, 1%), so CPU is dominant. Since B’s container requests are twice as big in the dominant resource (6% versus 3%), it will be allocated half as many containers under fair sharing.

I cannot understand the meaning of it will be allocated half as many containers under fair sharing. I guess it here is Application B, and Application B is allocated half of the number of Application A's containers. Is it right? Why is Application B allocated smaller containers even when it requires more resources?

Any suggestion and indication to some explanation document would be appreciated so much. Thank you in advance.

Upvotes: 10

Views: 6345

Answers (1)

Manjunath Ballur
Manjunath Ballur

Reputation: 6343

Dominant Resource Calculator is based on concept of Dominant Resource Fairness (DRF).

To understand DRF, you can refer to the paper here: https://people.eecs.berkeley.edu/~alig/papers/drf.pdf

In this paper, refer to section 4.1, where an example is given.

DRF tries to equalise the dominant shares (Memory requirements of A = CPU requirements of B).

Explanation

Total Resouces Available: 100 CPUs, 10000 GB Memory

Requirements of Application A: 2 CPUs, 300 GB Memory

Requirements of Application B: 6 CPUs, 100 GB Memory

A's dominant resource is Memory (2% of CPUs vs 3% of Memory)

B's dominant resource is CPU (6% of CPUs vs 1% of Memory)

Let's assume that "A" is assigned x containers and "B" is assigned y containers.

  1. Resource requirements of A

    2x CPUs + 300x GB Memory (2 CPUs and 300 GB Memory for each container)
    
  2. Resource requirements of B:

    6y CPUs + 100y GB Memory (6 CPUs and 100 GB Memory for each container)
    
  3. Total requirement is:

    2x + 6y <= 100 CPUs
    
    300x + 100y <= 10000 GB Memory
    
  4. DRF will try to equalise the dominant needs of A and B.

    A's dominant need: 300x / 10000 GB (300x out of 10000 GB of total memory)
    
    B's dominant need: 6y / 100 CPUs (6y out of 100 CPUs)
    
    DRF will try to equalise: (300x / 10000) = (6y / 100)
    
    Solving the above equation gives: x = 2y
    

If you substitute x = 2y and solve the equations in step 3, you will get x=20 and y=10.

It means:

  • Application A is allocated 20 containers: (40 CPUs, 6000 GB of Memory)
  • Application B is allocated 10 containers: (60 CPUs, 1000 GB of memoty)

You can see that:

Total allocated CPU is: 40 + 60 <= 100 CPUs available

Total allocated Memory is: 6000 + 1000 <= 10000 GB of Memory available

So, the above solution explains the meaning of the sentence:

Since B’s container requests are twice as big in the dominant resource (6% 
versus 3%), it will be allocated half as many containers under fair sharing.

Upvotes: 33

Related Questions