Minh Ha Pham
Minh Ha Pham

Reputation: 2596

Add new disks to datanode with bigger hard drivers

I am running a hdfs with some datanode, each datanode has 8 x 1TB hard drivers.

I want to add 2 x 2TB hard drivers for each datanode. I know how to add new hard drivers for datanode but I confuse that new hard drivers is bigger than old one so It maybe have problem in data distribution among hard drivers on datanode.

I think it is better to create 2 logical drivers (1TB) on 2TB hard driver then mount its to OS so that the volume of each datanode path is the same.

I need some advices. Thank for reading!

Upvotes: 3

Views: 726

Answers (1)

Stephen ODonnell
Stephen ODonnell

Reputation: 4466

If you have mixed sized disks in a datanode, it is a common problem that the smaller disks will fill faster than the biggest ones. This is because the default volume choosing policy in the datanode is round robin. Basically the datanode will write new data to each disk in turn, taking no consideration about the size of the disks or their free space.

There is an alternative volume choosing policy which is ideal to use on datanodes with mixed sized disks called AvailableSpaceVolumeChoosingPolicy - I am not sure what distribution of Hadoop you are using, but the CDH documentation is:

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/admin_dn_storage_balancing.html#concept_tws_bbg_2r

If you change to that policy, then by default 75% of new writes will go to the under used disks until they catch up with the other disks and then it will fall back to round robin writes.

Upvotes: 3

Related Questions