adding a hdfs datanode in a running spark/hadoop cluster


I have a spark cluster with 1 master and 2 nodes (worker + datanode).
I want to add another datanode. the problem is, when i do hdfs dfs -setrep -R -w 2 ,the result is :

    1st datanode -> DFS Used%: 75.61%
    2nd datanode -> DFS Used%: 66.78%
    3rd datanode -> DFS Used%: 8.83%

do you know how to do to manage to balance blocks in hdfs in order to be 30 -> 33% each approximatly?

Thanks

Upvotes: 2

Views: 448

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

Run balancer, the cluster balancing utility. This will rebalance the data across the datanodes.

hdfs balancer -threshold <threshold_value>

-threshold determines the percentage of disk capacity. The default value is 10.

This specifies that each DataNode's disk usage must be or should be adjusted to be within 10% of the cluster's overall usage.

This process might take longer time depending upon the amount of data to be balanced and does not affect the Cluster operations.

Or, Perform Datanode Commissioning if adding additional nodes are opted.

Upvotes: 1

Related Questions