Kalmesh Sam
Kalmesh Sam

Reputation: 71

Pull a file from remote location (local file system in some remote machine) into Hadoop HDFS

I have files in a machine (say A) which is not part of the Hadoop (OR HDFS) datacenter. So machine A is at remote location from HDFS datacenter.

Is there a script OR command OR program OR tool that can run in machines which are connected to Hadoop (part of the datacenter) and pull-in the file from machine A to HDFS directly ? If yes, what is the best and fastest way to do this ?

I know there are many ways like WebHDFS, Talend but they need to run from Machine A and requirement is to avoid that and run it in machines in datacenter.

Upvotes: 1

Views: 981

Answers (2)

Kiranb
Kiranb

Reputation: 31

Please tell me if I am getting your Question right way. 1-you want to copy the file in a remote location. 2- client machine is not a part of Hadoop cluster. 3- It is may not contains the required libraries for Hadoop.

Best way is webHDFS i.e. Rest API

Upvotes: 1

Harman
Harman

Reputation: 727

There are two ways to achieve this:

  1. You can pull the data using scp and store it in a temporary location, then copy it to hdfs, and delete the temporarily stored data.

  2. If you do not want to keep it as a 2-step process, you can write a program which will read the files from the remote machine, and write it to HDFS directly.

    This question along with comments and answers would come in handy for reading the file while, you can use the below snippet to write to HDFS.

    outFile = <Path to the the file including name of the new file> //e.g. hdfs://localhost:<port>/foo/bar/baz.txt
    
    FileSystem hdfs =FileSystem.get(new URI("hdfs://<NameNode Host>:<port>"), new Configuration());
    Path newFilePath=new Path(outFile);
    FSDataOutputStream out = hdfs.create(outFile);
    
    // put in a while loop here which would read until EOF and write to the file using below statement
    out.write(buffer);
    

    Let buffer = 50 * 1024, if you have enough IO capicity depending on processor or you could use a much lower value like 10 * 1024 or something

Upvotes: 2

Related Questions