Reputation: 583
Normally when a file has to be uploaded to s3, it has to first be written to disk, before using something like the TransferManager api to upload to the cloud. This cause data loss if the upload does not finish on time(application goes down and restarts on a different server, etc). So I was wondering if it's possible to write to a stream directly across the network with the required cloud location as the sink.
Upvotes: 19
Views: 24506
Reputation: 11
public void saveS3Object(String key, InputStream inputStream) throws Exception {
List<PartETag> partETags = new ArrayList<>();
InitiateMultipartUploadRequest initRequest = new
InitiateMultipartUploadRequest(bucketName, key);
InitiateMultipartUploadResult initResponse =
s3.initiateMultipartUpload(initRequest);
int partSize = 5242880; // Set part size to 5 MB.
try {
byte b[] = new byte[partSize];
int len = 0;
int i = 1;
while ((len = inputStream.read(b)) >= 0) {
// Last part can be less than 5 MB. Adjust part size.
ByteArrayInputStream partInputStream = new ByteArrayInputStream(b,0,len);
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucketName).withKey(key)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(0)
.withInputStream(partInputStream)
.withPartSize(len);
partETags.add(
s3.uploadPart(uploadRequest).getPartETag());
i++;
}
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
bucketName,
key,
initResponse.getUploadId(),
partETags);
s3.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, key, initResponse.getUploadId()));
}
}
Upvotes: 1
Reputation: 33504
It is possible:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.build();
s3Client.putObject("bucket", "key", youtINputStream, s3MetData)
Upvotes: 4
Reputation: 1950
Surprisingly this is not possible (at time of writing this post) with standard Java SDK. Anyhow thanks to this 3rd party library you can atleast avoid buffering huge amounts of data to either memory or disk since it buffers internally ~5MB parts and uploads them automatically within multipart upload for you.
There is also github issue open in SDK repository one can follow to get updates.
Upvotes: 10
Reputation: 191
You don't say what language you're using, but I'll assume Java based on your capitalization. In which case the answer is yes: TransferManager
has an upload()
method that takes a PutObjectRequest
, and you can construct that object around a stream.
However, there are two important caveats. The first is in the documentation for PutObjectRequest:
When uploading directly from an input stream, content length must be specified before data can be uploaded to Amazon S3
So you have to know how much data you're uploading before you start. If you're receiving an upload from the web and have a Content-Length
header, then you can get the size from it. If you're just reading a stream of data that's arbitrarily long, then you have to write it to a file first (or the SDK will).
The second caveat is that this really doesn't prevent data loss: your program can still crash in the middle of reading data. One thing that it will prevent is returning a success code to the user before storing the data in S3, but you could do that anyway with a file.
Upvotes: 19