Reputation: 1575
I want to write a script or something like .xml file which explicitly defines the datanodes in Hadoop cluster to store a particular file blocks. for example: Suppose there are 4 slave nodes and 1 Master node (total 5 nodes in hadoop cluster ). there are two files file01(size=120 MB) and file02(size=160 MB).Default block size =64MB
Now I want to store one of two blocks of file01 at slave node1 and other one at slave node2. Similarly one of three blocks of file02 at slave node1, second one at slave node3 and third one at slave node4. So,my question is how can I do this ?
actually there is one method :Make changes in conf/slaves file every time to store a file. but I don't want to do this So, there is another solution to do this ?? I hope I made my point clear. Waiting for your kind response..!!!
Upvotes: 2
Views: 1989
Reputation: 8088
NameNode is an ultimate authority to decide on the block placement.
There is Jira about the requirements to make this algorithm pluggable:
https://issues.apache.org/jira/browse/HDFS-385
but unfortunetely it is in the 0.21 version, which is not production (alhough working not bad at all).
I would suggest to plug you algorithm to 0.21 if you are on the research state and then wait for 0.23 to became production, or, to downgrade the code to 0.20 if you do need it now.
Upvotes: 1
Reputation: 30089
There is no method to achieve what you are asking here - the Name Node will replicate blocks to data nodes based upon rack configuration, replication factor and node availability, so even if you do managed to get a block on two particular data nodes, if one of those nodes goes down, the name node will replicate the block to another node.
Your requirement is also assuming a replication factor of 1, which doesn't give you any data redundancy (which is a bad thing if you lose a data node).
Let the namenode manage block assignments and use the balancer periodically if you want to keep your cluster evenly distibuted
Upvotes: 4