Veer
Veer

Reputation: 31

In hadoop what's under replication and over replication mean and how does it work?

IN map reduce concept under replica and over replica to use. how to balance the over replica and under replica.

Upvotes: 3

Views: 3813

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29195

I think you are aware that by default replication factor is 3.

Over-replicated blocks are blocks that exceed their target replication for the file they belong to. Normally, over-replication is not a problem, and HDFS will automatically delete excess replicas. Thats how its balanced in this case.

Under-replicated blocks are blocks that do not meet their target replication for the file they belong to.

To balance these HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication.

You can get information about the blocks being replicated (or waiting to be replicated) using

hdfs    dfsadmin    -metasave. 

if you execute below command, you will get the detailed stats.

hdfs    fsck    / 
......................

Status: HEALTHY 
Total   size:   511799225   B   
Total   dirs:   10  Total   files:  22  
Total   blocks  (validated):    22  (avg.   block   size    23263601    B)  
Minimally   replicated  blocks: 22  (100.0  %)  
Over-replicated blocks: 0   (0.0    %)  
Under-replicated    blocks: 0   (0.0    %)  
Mis-replicated  blocks:     0   (0.0    %)  
Default replication factor: 3   
Average block   replication:    3.0 
Corrupt blocks:     0   
Missing replicas:       0   (0.0    %)  
Number  of  data-nodes:     4   
Number  of  racks:      1


The filesystem  under   path    '/' is  HEALTHY 

Upvotes: 1

Related Questions