jay
jay

Reputation: 81

Is there a way to read csv file from S3 using Java without downloading it

I was able to connect Java to AWS S3, and I was able to perform basic operations like listing buckets. I need a way to read a CSV file without downloading it. I am attaching my current code here.

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.Bucket;
import com.amazonaws.services.s3.model.CannedAccessControlList;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3ObjectSummary;

import java.io.IOException;
import java.io.InputStream;
import java.util.List;
import java.util.Properties;

public class test {

    public static void main(String args[])throws IOException
    {

        AWSCredentials credentials =new BasicAWSCredentials("----","----");

        AmazonS3 s3client = AmazonS3ClientBuilder
                .standard()
                .withCredentials(new AWSStaticCredentialsProvider(credentials))
                .withRegion(Regions.US_EAST_2)
                .build();

        List<Bucket> buckets = s3client.listBuckets();
        for(Bucket bucket : buckets) {
            System.out.println(bucket.getName());
        }
    }

}

Upvotes: 3

Views: 12793

Answers (5)

Mukit09
Mukit09

Reputation: 3399

I know I am late to answer it, but I think it may help future seekers.

We can take the help of AWS SDK. Here is an example on aws docs. See the Java section.

If anyone follows the doc, a file will be downloaded as FileOutputStream is used there to save the file. But you can store the output of your query in String as well like this:

String string = IOUtils.toString(resultInputStream);

If you follow the blog, you already know, resultInputStream is the output of your query.

Upvotes: 0

Mark
Mark

Reputation: 2428

If you can use Java NIO Paths and Channels which are available in Java 1.8 and later then you can use the aws-java-nio-spi-for-s3 package. With that package on your classpath Java calls to open a Channel or Path for an s3 URI will be delegated to the package allowing the application to access these files directly without needing to stage the file locally.

https://github.com/awslabs/aws-java-nio-spi-for-s3

Upvotes: 0

Khtarnas
Khtarnas

Reputation: 31

The answer by @jay and @Elikill58 is super helpful! This just adds a bit of clarity and accessibility to it.

To get an object from and S3 bucket after you have done all the authentication work is with the .getObject(String bucketName, String fileName) function. Note what it says about file names in the documentation:

An Amazon S3 bucket has no directory hierarchy such as you would find in a typical computer file system. You can, however, create a logical hierarchy by using object key names that imply a folder structure. For example, instead of naming an object sample.jpg, you can name it photos/2006/February/sample.jpg.

To get an object from such a logical hierarchy, specify the full key name for the object in the GET operation. For a virtual hosted-style request example, if you have the object photos/2006/February/sample.jpg, specify the resource as /photos/2006/February/sample.jpg. For a path-style request example, if you have the object photos/2006/February/sample.jpg in the bucket named examplebucket, specify the resource as /examplebucket/photos/2006/February/sample.jpg.

Once you have an the S3Object that'll be returned, just pass it into this function below (which is just a modified version of @jay's that fixes a few errors)!

private static void parseCSVS3Object(S3Object data) {
    BufferedReader reader = new BufferedReader(new InputStreamReader(data.getObjectContent()));

    try {
        // Get all the csv headers
        String line = reader.readLine();
        String[] headers = line.split(",");

        // Get number of columns and print headers
        int length = headers.length;
        for (String header : headers) {
            System.out.print(header + "   ");
        }

        while((line = reader.readLine()) != null) {
            System.out.println();

            // get and print the next line (row)
            String[] row = line.split(",");
            for (String value : row) {
                System.out.print(value + "   ");
            }
        }
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

Upvotes: 3

jay
jay

Reputation: 81

There is a way with a code like this. In my code I am trying to get the file which we want to read in my S3Object obj , then I am passing that file to InputStreamReader() :

S3Object Obj = s3client.getObject("<Bucket Name>", "File Name");
BufferedReader reader = new BufferedReader(new InputStreamReader(Obj.getObjectContent()));

// this will store characters of first row in array    
String row[] = line.split(","); 
                                   
// this will fetch number of columns
int length = row.length;

while((line=reader.readLine()) != null) {

     // storing characters of corresponding line in an array
     String value[] = line.split(","); 
                                          
     for(int i=0;i<length;i++) {
         System.out.print(value[i]+"   ");
     }
     System.out.println();
}

Upvotes: 3

Joshua Fox
Joshua Fox

Reputation: 19655

For your code to read the file, it needs the contents -- and that means copying it to the local system.

However, you can use "range" (Java) to read just a part.

Upvotes: 0

Related Questions