Paul Wolfe
Paul Wolfe

Reputation: 121

Google cloud storage changing storage class without rewrite

We're looking into Google Nearline as a solution for some "warm" storage requirements. Basically we expect parts of a dataset of around 5 PB to be accessed every now and again, but the whole set very infrequently.

That said, there may be one or two times a year we want to run something across the whole dataset (ie patch all the data with a new field). These algorithms would run within GCP (dataproc). Doing this on nearline blows up our budget 50k per time.

Wondering if there are possibilities of changing the storage class without incurring the full data retrieval penalty? I see that a storage class can be changed vi a gsutil rewrite but this will retrieve the data.

Perhaps we can use a lifecycle rule to change the storage class without a retrieval? Or is there any other way to do it?

Upvotes: 1

Views: 508

Answers (1)

Raunak Jhawar
Raunak Jhawar

Reputation: 1651

The gsutil rewrite as an operation will end up creating new objects on the storage class which means you read GCS objects in one storage object class and write in another (i.e. new objects get created)

This operation is charged to your project.

Upvotes: 2

Related Questions