Reputation: 121
We're looking into Google Nearline as a solution for some "warm" storage requirements. Basically we expect parts of a dataset of around 5 PB to be accessed every now and again, but the whole set very infrequently.
That said, there may be one or two times a year we want to run something across the whole dataset (ie patch all the data with a new field). These algorithms would run within GCP (dataproc). Doing this on nearline blows up our budget 50k per time.
Wondering if there are possibilities of changing the storage class without incurring the full data retrieval penalty? I see that a storage class can be changed vi a gsutil rewrite
but this will retrieve the data.
Perhaps we can use a lifecycle rule to change the storage class without a retrieval? Or is there any other way to do it?
Upvotes: 1
Views: 508
Reputation: 1651
The gsutil rewrite
as an operation will end up creating new objects on the storage class which means you read GCS objects in one storage object class and write in another (i.e. new objects get created)
This operation is charged to your project.
Upvotes: 2