Reputation: 21
In a Hadoop Cluster, is the data automatically replicated across the data nodes or it must be programmed?
If it must be programmed, then how can I do it?
Upvotes: 1
Views: 185
Reputation: 10082
The default value of dfs.replication
is 3. This is present in your hdfs.site.xml
configuration file. This means that when you setup your hadoop cluster, it is automatically configured to replicate each block thrice.
It can be changed using the following ways:
You can change the value of dfs.replication
in your hdfs-site.xml
and set it to an integer you'd like (1 means no replicas)
You can pass the replication factor as a per-file value using the command setrep
:
hadoop fs -setrep -w 3 /user/hadoop/file.txt
This will replicate file.txt
3 times.
Upvotes: 1