S. Drazic
S. Drazic

Reputation: 13

Is there a way to stream data to amazon s3 files using aws-sdk-go that is similar to google storage Write() method?

We're currently doing a transition from Google Storage to Amazon S3 storage.

On Google Storage I've used this function https://godoc.org/cloud.google.com/go/storage#Writer.Write to write to files. It basically streams bytes of data into file using io.Writer interface and saves file when Close() is called on writer. That allows us to stream data into a file all day long and finalize it on the end of the day without ever creating a local copy of the file.

I've examined aws-sdk-go s3 documentation on godoc and can't seem to find a similar function that would allow us to just stream data to file without creating a file locally first. All I've found are functions that stream data from already existing local files like PutObject().

So my question is: Is there a way to stream data to amazon s3 files using aws-sdk-go that is similar to google storage Write() method?

Upvotes: 1

Views: 5698

Answers (1)

johlo
johlo

Reputation: 5500

The S3 HTTP API doesn't have any append-like write method, instead it uses multipart uploads. You basically upload fixed size chunks with an index number and S3 will store them internally as separate files and automatically concatenate them when the last chunks is received. Default chunk size is 5MB (can be changed) and you can have atmost 10,000 chunks (can't be changed).

Unfortunately it doesn't look like the aws-sdk-go API provides any convenient interface for working with chunks to achieve the streaming behaviour.

You would have to work with the chunks manually (called parts in aws-sdk-go) directly using CreateMultipartUpload to initialize the transfers, create UploadPartInput instances for the data you want to send and send it with UploadPart. When the final chunk has been sent you need to close the transaction with CompleteMultipartUpload.

Regarding the question on how to stream directly from e.g. []byte data instead of a file: the Body field of the UploadPartInput struct is where you put your content you want to send to S3, note that Body is of type io.readseeker. This means you can create a io.readseeker from e.g. your []byte content with something like bytes.NewReader([]byte) and set UploadPartInput.Body to that.

The s3manager upload utility of uploads could be a good starting point to see how the multipart functions are used, it uses the multipart API to upload a single large file as smaller chunks concurrently.

Keep in mind that you should set a lifecycle policy that removes unfinished multipart uploads. If you don't send the final CompleteMultipartUpload all the chunks that have been uploaded will stay in S3 and incur costs. The policy can be set through AWS console/CLI or programmatically with aws-sdk-go.

Upvotes: 5

Related Questions