Reputation: 131
Does Hadoop guarantee that different blocks from same file will be stored on different machines in the cluster? Obviously replicated blocks will be on different machines.
Upvotes: 1
Views: 547
Reputation: 20969
Well Hadoop does not guarantee that. Since that is a huge loss of security, if you are requesting a file within a job, a downed datanode will cause the complete job to fail. Just because a block is not available. Can't imagine a usecase for your question, maybe you can tell a bit more to understand what your intention really was.
Upvotes: 0
Reputation: 2669
On the contrary I think. Setting aside replication, each datanode stores each block of data as its own file in the local file system.
Upvotes: 0
Reputation: 38265
Apparently not: http://hadoop.apache.org/common/docs/r0.20.2/hdfs_user_guide.html#Rebalancer
Upvotes: 0
Reputation: 35828
No. If you look at the HDFS Architecture Guide, you'll see (in the diagram) that file part-1
has a replication factor of 3, and is made up of three blocks labelled 2, 4, and 5. Note how blocks 2 and 5 are on the same Datanode in one case.
Upvotes: 1