lucky_start_izumi
lucky_start_izumi

Reputation: 2591

about hadoop filesystem transferFromLocalFile

I am writing code to transfer files to hadoop hdfs parallel. So I have many threads calling filesystem.copyFromLocalFile.

I think the cost of opening a filesystem is not small, so I just have one filesystem opened in my project. So I though there might be a a problem when so many threads calling it at the same time. But so far, it works fine with no problem.

Could anyone please give me some information about this copy method? Thank you very much& have a great weekend.

Upvotes: 1

Views: 236

Answers (2)

David Gruzman
David Gruzman

Reputation: 8088

I see the following design points to consider:
a) Where will be bottleneck of the process? I think in 2-3 parallel copy operations local disk or 1GB Ethernet will became a bottleneck. You can do it in form of multithreaded application or you can run a few processes. In any case I do not think you need a high level of parallelism. b) Error handling. Failure of the one thread should not stop the whole process, and, in the same time file should not be lost. What I am usually doing in such cases is to agree that in a worst case file can be copied twice. If it is Ok - system can work in simple "copy then delete" scenario. c) If you copy from the one of the cluster nodes - HDFS will became unbalanced, since one replica will be stored on the host from where you copy. You will need to do the balance constantly.

Upvotes: 1

SSaikia_JtheRocker
SSaikia_JtheRocker

Reputation: 5063

Can you tell me what more information you want about copyFromLocalFile()?

I'm not sure but I guess in your case, threads share the same resource among themselves. Since, you have only one instance of FileSystem, each thead will probably share this object in a time sharing basis among themselves.

Upvotes: 0

Related Questions