Reputation: 41
I need to work with huge files in Amazon S3. How can I get part of huge file from S3? Best way would be get stream with the seek possibility.
Unfortunately, CanSeek
property of response.ResponseStream
is false:
GetObjectRequest request = new GetObjectRequest();
request.BucketName = BUCKET_NAME;
request.Key = NumIdToAmazonKey(numID);
GetObjectResponse response = client.GetObject(request);
Upvotes: 4
Views: 4373
Reputation: 841
Way late for the OP, but I've just posted an article and code demonstration of a SeekableS3Stream
that performs reasonably well in real-world use cases.
https://github.com/mlhpdx/seekable-s3-stream
Specifically, I demonstrate reading a single small file from a much larger ISO disk image using the DiscUtils library unmodified by implementing a random-access stream that uses Range
requests to pull sections of the file as-needed and maintains them in an MRU list to prevent re-downloading ranges for hot data structures in the file (e.g. zip central directory records).
The use is similarly simple:
using System;
using System.IO;
using System.Threading.Tasks;
using Amazon.S3;
using DiscUtils.Iso9660;
namespace Seekable_S3_Stream
{
class Program
{
const string BUCKET = "rds.nsrl.nist.gov";
const string KEY = "RDS/current/RDS_ios.iso"; // "RDS/current/RDS_modern.iso";
const string FILENAME = "READ_ME.TXT";
static async Task Main(string[] args)
{
var s3 = new AmazonS3Client();
using var stream = new Cppl.Utilities.AWS.SeekableS3Stream(s3, BUCKET, KEY, 1 * 1024 * 1024, 4);
using var iso = new CDReader(stream, true);
using var file = iso.OpenFile(FILENAME, FileMode.Open, FileAccess.Read);
using var reader = new StreamReader(file);
var content = await reader.ReadToEndAsync();
await Console.Out.WriteLineAsync($"{stream.TotalRead / (float)stream.Length * 100}% read, {stream.TotalLoaded / (float)stream.Length * 100}% loaded");
}
}
}
Upvotes: 0
Reputation: 21
After a frustrating afternoon with the same problem I found the static class AmazonS3Util https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/S3/TS3Util.html
Which has a MakeStreamSeekable method.
Upvotes: 2
Reputation: 5700
I know this isn't exactly what OP is asking for but I needed a seekable s3 stream so I could read Parquet files without downloading them so I gave this a shot here: https://github.com/mukunku/RandomHelpers/blob/master/SeekableS3Stream.cs
Performance wasn't as bad as I expected. You can use the TimeWastedSeeking property to see how much time is being wasted by allowing Seek() on an s3 stream.
Here's an example on how to use it:
using (var client = new AmazonS3Client(credentials, Amazon.RegionEndpoint.USEast1))
{
using (var stream = SeekableS3Stream.OpenFile(client, "myBucket", "path/to/myfile.txt", true))
{
//stream is seekable!
}
}
Upvotes: 2
Reputation: 3172
You could do following to read a certain part of your file
GetObjectRequest request = new GetObjectRequest
{
BucketName = bucketName,
Key = keyName,
ByteRange = new ByteRange(0, 10)
};
See the documentation
Upvotes: 6