Reputation: 206
We have 10 node HDFS (Hadoop - 2.6, cloudera - 5.8) cluster, and 4 are of disk size - 10 TB and 6 node of disk size - 3TB. In that case, Disk is constantly getting full on small size disk nodes, however the disk is free available on high disk size nodes.
I tried to understand, how namenode writes data/block to different disk size nodes. whether it is equally divided or some percentage of data getting written.
Upvotes: 1
Views: 1270
Reputation: 5947
You should look at dfs.datanode.fsdataset.volume.choosing.policy. By default this is set to round-robin
but since you have an asymmetric disk setup you should change it to available space
.
You can also fine tune disk usage with the other two choosing
properties.
For more information see:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/admin_dn_storage_balancing.html
Upvotes: 1