Reputation: 237
I need to read a large (>15mb) file (say sample.csv
) from an Amazon S3 bucket. I then need to process the data present in sample.csv
and keep writing it to another directory in the S3 bucket. I intend to use an AWS Lambda function to run my java code.
As a first step I developed java code that runs on my local system. The java code reads the sample.csv
file from the S3 bucket and I used the put
method to write data back to the S3 bucket. But I find only the last line was processed and put back.
Region clientRegion = Region.Myregion;
AwsBasicCredentials awsCreds = AwsBasicCredentials.create("myAccessId","mySecretKey");
S3Client s3Client = S3Client.builder().region(clientRegion).credentialsProvider(StaticCredentialsProvider.create(awsCreds)).build();
ResponseInputStream<GetObjectResponse> s3objectResponse = s3Client.getObject(GetObjectRequest.builder().bucket(bucketName).key("Input/sample.csv").build());
BufferedReader reader = new BufferedReader(new InputStreamReader(s3objectResponse));
String line = null;
while ((line = reader.readLine()) != null) {
s3Client.putObject(PutObjectRequest.builder().bucket(bucketName).key("Test/Testout.csv").build(),RequestBody.fromString(line));
}
Example: sample.csv contains
1,sam,21,java,beginner;
2,tom,28,python,practitioner;
3,john,35,c#,expert.
My output should be
1,mas,XX,java,beginner;
2,mot,XX,python,practitioner;
3,nhoj,XX,c#,expert.
But only 3,nhoj,XX,c#,expert
is written in the Testout.csv
.
Upvotes: 5
Views: 8653
Reputation: 270154
The putObject()
method creates an Amazon S3 object.
It is not possible to append or modify an S3 object, so each time the while
loop executes, it is creating a new Amazon S3 object.
Instead, I would recommend:
GetObject()
with a destinationFile
to download to disk)This separates the AWS code from your processing code, which should be easier to maintain.
Upvotes: 5