Black_Rider
Black_Rider

Reputation: 1575

How to explicilty define datanodes to store a particular given file in HDFS?

I want to write a script or something like .xml file which explicitly defines the datanodes in Hadoop cluster to store a particular file blocks. for example: Suppose there are 4 slave nodes and 1 Master node (total 5 nodes in hadoop cluster ). there are two files file01(size=120 MB) and file02(size=160 MB).Default block size =64MB

Now I want to store one of two blocks of file01 at slave node1 and other one at slave node2. Similarly one of three blocks of file02 at slave node1, second one at slave node3 and third one at slave node4. So,my question is how can I do this ?

actually there is one method :Make changes in conf/slaves file every time to store a file. but I don't want to do this So, there is another solution to do this ?? I hope I made my point clear. Waiting for your kind response..!!!

Upvotes: 2

Views: 1989

Answers (2)

David Gruzman
David Gruzman

Reputation: 8088

NameNode is an ultimate authority to decide on the block placement. There is Jira about the requirements to make this algorithm pluggable: https://issues.apache.org/jira/browse/HDFS-385
but unfortunetely it is in the 0.21 version, which is not production (alhough working not bad at all).
I would suggest to plug you algorithm to 0.21 if you are on the research state and then wait for 0.23 to became production, or, to downgrade the code to 0.20 if you do need it now.

Upvotes: 1

Chris White
Chris White

Reputation: 30089

There is no method to achieve what you are asking here - the Name Node will replicate blocks to data nodes based upon rack configuration, replication factor and node availability, so even if you do managed to get a block on two particular data nodes, if one of those nodes goes down, the name node will replicate the block to another node.

Your requirement is also assuming a replication factor of 1, which doesn't give you any data redundancy (which is a bad thing if you lose a data node).

Let the namenode manage block assignments and use the balancer periodically if you want to keep your cluster evenly distibuted

Upvotes: 4

Related Questions