Serban Stoenescu
Serban Stoenescu

Reputation: 3886

Java transfer from HDFS to S3

I want to transfer a file from HDFS to S3 in Java. Some files may be huge, so I don't want to download my file locally before uploading it to S3. Is there any way to do that in Java?

Here's what I have right now (a piece of code that uploads a local file to S3). I can't really use this, because using the File object implies me having it on my HDD.

File f = new File("/home/myuser/test");

TransferManager transferManager  = new TransferManager(credentials);
MultipleFileUpload upload = transferManager.uploadDirectory("mybucket","test_folder",f,true);

Thanks

Upvotes: 4

Views: 4259

Answers (1)

Serban Stoenescu
Serban Stoenescu

Reputation: 3886

I figured out the uploading part.

AWSCredentials credentials = new BasicAWSCredentials(
            "whatever",
            "whatever");

    File f = new File("/home/myuser/test");

    TransferManager transferManager  = new TransferManager(credentials);

    //+upload from HDFS to S3
    Configuration conf = new Configuration();
    // set the hadoop config files
    conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
    conf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));

    Path path = new Path("hdfs://my_ip_address/user/ubuntu/test/test.txt");
    FileSystem fs = path.getFileSystem(conf);
    FSDataInputStream inputStream = fs.open(path);
    ObjectMetadata objectMetadata =  new ObjectMetadata();
    Upload upload = transferManager.upload("xpatterns-deployment-ubuntu", "test_cu_jmen3", inputStream, objectMetadata);
    //-upload from HDFS to S3

    try {
        upload.waitForCompletion();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }

Any ideas about how to do something similar for downloading? I haven't found any download() method in TransferManager that can use a stream like in the above code.

Upvotes: 3

Related Questions