Reputation: 176
What is the best and fast way to achieve parallel copy to hadoop from an NFS mount? We have a mount with huge number of files and we need to copy it into hdfs.
Some options:
Regards, JD
Upvotes: 1
Views: 3135
Reputation: 111
I think the key question is what is on the source side of the NFS link? If it is a NAS you are likely to be better off with a situation where you have several client machines running copyFromLocal at the same time (one each). Even high performance NASs are going to be displeased when you have more than 5-10 simultaneous disk reads from the same client. I would model the following (all with copyFromLocal):
I would definitely avoid M/R as the process startup cost is too high and even distcp won't do as well because you won't be able to control how heavily the source NAS is hit (this will be your bottleneck).
Upvotes: 1