Michail N
Michail N

Reputation: 3855

HDFS does not replicate blocks

I have recently installed Hadoop (Cloudera). I get an error that I have under replicated blocks (in Cloudera Manager which is a gui for the installation). So when I run

hdfs dfsadmin -report

I get

Configured Capacity: 555730632704 (517.56 GB)
Present Capacity: 524592504832 (488.56 GB)
DFS Remaining: 524592193536 (488.56 GB)
DFS Used: 311296 (304 KB)
DFS Used%: 0.00%
Under replicated blocks: 5
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

Which means that for some reason my hdfs does not replicate the blocks. What to check from here? Is it possible that it is an issue with my HDFS Balancer and that I need to run it manually?

Upvotes: 0

Views: 797

Answers (1)

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21563

There are 2 main reasons for having under replicated blocks:

1. Replication factor exceeds available data nodes

Suppose you only have 2 Data nodes, and your replication factor is 3, then each block that you create will stay under replicated as there simply are not 3 data nodes to replicate to.

Solutions can be to either add data nodes, or to reduce the replication factor.

2. Cluster has been too busy

The cluster will prioritize 'real' work over replication of blocks. Therefore, if you create a large number of blocks, it can take a while to catch up. If your cluster is permanently busy, in theory there may always be some under replicated blocks.

Note that as you mention that it is a new cluster, and the disk seems to be allmost empty, I don't think 2 will be the case here.


In addition to this it is of course possible that something actually broke (like the balancing), but I would not worry about that untill you verified that the two cases above can be ruled out. Most things that break tend to result in an error here or there, so assuming you don't see any it is unlikely that this is the case.

Upvotes: 1

Related Questions