Dervall
Dervall

Reputation: 5744

Amazon S3 Transferutility use FilePath or Stream?

When uploading a file to S3 using the TransportUtility class, there is an option to either use FilePath or an input stream. I'm using multi-part uploads.

I'm uploading a variety of things, of which some are files on disk and others are raw streams. I'm currently using the InputStream variety for everything, which works OK, but I'm wondering if I should specialize the method further. For the files on disk, I'm basically using File.OpenRead and passing that stream to the InputStream of the transfer request.

Are there any performance gains or otherwise to prefer the FilePath method over the InputStream one where the input is known to be a file.

In short: Is this the same thing

using (var fs = File.OpenRead("some path")) 
{
    var uploadMultipartRequest = new TransferUtilityUploadRequest
    {
        BucketName = "defaultBucket",
        Key = "key",
        InputStream = fs,
        PartSize = partSize
    };

    using (var transferUtility = new TransferUtility(s3Client))
    {
        await transferUtility.UploadAsync(uploadMultipartRequest);
    }
}

As:

    var uploadMultipartRequest = new TransferUtilityUploadRequest
    {
        BucketName = "defaultBucket",
        Key = "key",
        FilePath = "some path",
        PartSize = partSize
    };

    using (var transferUtility = new TransferUtility(s3Client))
    {
        await transferUtility.UploadAsync(uploadMultipartRequest);
    }

Or are there any significant difference between the two? I know if files are large or not, and could prefer one method or another based on that.

Edit: I've also done some decompiling of the S3Client, and there does indeed seem to be some difference in regards to the concurrency level of the transfer, as found in MultipartUploadCommand.cs

private int CalculateConcurrentServiceRequests()
{
  int num = !this._fileTransporterRequest.IsSetFilePath() || this._s3Client is AmazonS3EncryptionClient ? 1 : this._config.ConcurrentServiceRequests;
  if (this._totalNumberOfParts < num)
    num = this._totalNumberOfParts;
  return num;
}

Upvotes: 2

Views: 14127

Answers (2)

Troy DeMonbreun
Troy DeMonbreun

Reputation: 3890

I think the difference may be that they both use Multipart Upload API, but using a FilePath allows for concurrent uploads, however,

When you're using a stream for the source of data, the TransferUtility class does not do concurrent uploads.

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingTheMPDotNetAPI.html

Upvotes: 0

Amr Elgarhy
Amr Elgarhy

Reputation: 69012

From the TransferUtility documentation:

When uploading large files by specifying file paths instead of a stream, TransferUtility uses multiple threads to upload multiple parts of a single upload at once. When dealing with large content sizes and high bandwidth, this can increase throughput significantly.

Which tells that using the file paths will use the MultiPart upload, but using the stream wont.

But when I read through this Upload Method (stream, bucketName, key):

Uploads the contents of the specified stream. For large uploads, the file will be divided and uploaded in parts using Amazon S3's multipart API. The parts will be reassembled as one object in Amazon S3.

Which means that MultiPart is used on Streams as well.
Amazon recommend to use MultiPart upload if the file size is larger than 100MB http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html

Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation.

Using multipart upload provides the following advantages:

Improved throughput—You can upload parts in parallel to improve throughput. Quick recovery from any network issues—Smaller part size minimizes the impact of restarting a failed upload due to a network error. Pause and resume object uploads—You can upload object parts over time. Once you initiate a multipart upload there is no expiry; you must explicitly complete or abort the multipart upload. Begin an upload before you know the final object size—You can upload an object as you are creating it.

So based on Amazon S3 there is no different between using Stream or File Path, but It might make a slightly performance difference based on your code and OS.

Upvotes: 3

Related Questions