Seb
Seb

Reputation: 152

Migrating data from S3 to Google cloud storage

I need to move a large amount of files (on the order of tens of terabytes) from Amazon S3 into Google Cloud Storage. The files in S3 are all under 500mb.

So far I have tried using gsutil cp with the parallel option (-m) to using S3 as source and GS as destination directly. Even tweaking the multi-processing and multi-threading parameters I haven't been able to achieve a performance of over 30mb/s.

What I am now contemplating:

If the first option were supported, I would really appreciate details on how to do that. However, it seems like I'm gonna have to find out how to do the second one. I'm unsure of how to pursue this avenue because I would need to keep track of the gsutil resumable transfer feature on many nodes and I'm generally inexperienced running this sort of hadoop job.

Any help on how to pursue one of these avenues (or something simpler I haven't thought of) would be greatly appreciated.

Upvotes: 1

Views: 2359

Answers (2)

rein
rein

Reputation: 33465

Google has recently released the Cloud Storage Transfer Service which is designed to transfer large amounts of data from S3 to GCS: https://cloud.google.com/storage/transfer/getting-started

(I realize this answer is a little late for the original question but it may help future visitors with the same question.)

Upvotes: 3

Mike Schwartz
Mike Schwartz

Reputation: 12155

You could set up a Google Compute Engine (GCE) account and run gsutil from GCE to import the data. You can start up multiple GCE instances, each importing a subset of the data. That's part of one of the techniques covered in the talk we gave at Google I/O 2013 called Importing Large Data Sets into Google Cloud Storage.

One other thing you'll want to do if you use this approach is to use the gsutil cp -L and -n options. -L creates a manifest that records details about what has been transferred, and -n allows you to avoid re-copying files that were already copied (in case you restart the copy from the beginning, e.g., after an interruption). I suggest you update to gsutil version 3.30 (which will come out in the next week or so), which improves how the -L option works for this kind of copying scenario.

Mike Schwartz, Google Cloud Storage team

Upvotes: 5

Related Questions