samia
samia

Reputation: 21

Hadoop cluster. Data is automatically replicated across the cluster data nodes

  1. In a Hadoop Cluster, is the data automatically replicated across the data nodes or it must be programmed?

  2. If it must be programmed, then how can I do it?

Upvotes: 1

Views: 185

Answers (1)

philantrovert
philantrovert

Reputation: 10082

  1. The default value of dfs.replication is 3. This is present in your hdfs.site.xml configuration file. This means that when you setup your hadoop cluster, it is automatically configured to replicate each block thrice.

  2. It can be changed using the following ways:

    • You can change the value of dfs.replication in your hdfs-site.xml and set it to an integer you'd like (1 means no replicas)

    • You can pass the replication factor as a per-file value using the command setrep :

      hadoop fs -setrep -w 3 /user/hadoop/file.txt

      This will replicate file.txt 3 times.

Upvotes: 1

Related Questions