Sajeeva Lakmal
Sajeeva Lakmal

Reputation: 197

How to reduce the replication factor in a HDFS directory and it's impact

We are using Hortonworks HDP 2.1 (HDFS 2.4), with replication factor 3. We have recently decommissioned a datanode and that left a lot of under replicated blocks in the cluster.

Cluster is now trying to satisfy the replication factor by distributing under replicated blocks among other nodes.

  1. How do I stop that process. I am OK with some files being replicated only twice. If I change the replication factor to 2 in that directory, will that process be terminated?

  2. What's the impact of making the replication factor to 2 for a directory which has files with 3 copies. Will the cluster start another process to remove the excess copy for each file with 3 copies?

Appreciate your help on this. Kindly share the references too. Thanks. Sajeeva.

Upvotes: 2

Views: 1277

Answers (1)

Chris Nauroth
Chris Nauroth

Reputation: 9854

We have recently decommissioned a datanode and that left a lot of under replicated blocks in the cluster.

If the DataNode was gracefully decommissioned, then it should not have resulted in under-replicated blocks. As an edge case though, if decommissioning a node brings the total node count under the replication factor set on a file, then by definition that file's blocks will be under-replicated. (For example, consider an HDFS cluster with 3 DataNodes. Decommissioning a node results in 2 DataNodes remaining, so now files with a replication factor of 3 have under-replicated blocks.)

During decommissioning, HDFS re-replicates (copies) the blocks hosted on that DataNode over to other DataNodes in the cluster, so that the desired replication factor is maintained. More details on this are here:

  1. How do I stop that process. I am OK with some files being replicated only twice. If I change the replication factor to 2 in that directory, will that process be terminated?

There is no deterministic way to terminate this process as a whole. However, if you lower replication factor to 2 on some of the under-replicated files, then the NameNode will stop scheduling re-replication work for the blocks of those files. This means that for the blocks of those files, HDFS will stop copying new replicas across different DataNodes.

The typical replication factor of 3 is desirable from a fault tolerance perspective. You might consider setting replication factor on those files back to 3 later.

  1. What's the impact of making the replication factor to 2 for a directory which has files with 3 copies. Will the cluster start another process to remove the excess copy for each file with 3 copies?

Yes, the NameNode will flag these files as over-replicated. In response, it will schedule block deletions at DataNodes to restore the desired replication factor of 2. These block deletions are dispatched to the DataNodes asynchronously, in response to their heartbeats. Within the DataNode, the block deletion executes asynchronously to clean the underlying files from the disk.

More details on this are described in the Apache Hadoop Wiki.

Upvotes: 2

Related Questions