Reputation: 172
I'm currently trying upload and then untar a very large file (1.3 tb) into Google Cloud Storage at the lowest price.
I initially thought about creating a really cheap instance just to download the file and put it in a bucket, then creating a new instance with a good amount of RAM to untar the file and then put the result in a new bucket. However since the bucket price depends on the nbr of request I/O I'm not sure it's the best option, and even for performance it might not be the best.
What would be the best strategy to untar the file in the cheapest way?
Upvotes: 4
Views: 2441
Reputation: 7737
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
Upvotes: 3