caarlos0
caarlos0

Reputation: 20633

Fast way to delete big folder on GCS bucket

I have a very big GCS bucket (several TB), with several sub directories, each with a couple terabytes of data.

I want to delete some of those folders.

I tried to use gsutil from a Cloud Shell, but it is taking ages.

For reference, here is the command I'm using:

gsutil -m rm -r "gs://BUCKET_NAME/FOLDER"

I was looking at this question, and thought maybe I could use that, but is seems like it can't filter by folder name, and I can't filter by any other thing as folders have some mixed content.

So far, my last resort would be to wait until the folders I want to delete are "old", and set the lifecycle rule accordingly, but that could take too long.

Are there any other ways to make this faster?

Upvotes: 0

Views: 2813

Answers (3)

David Spenard
David Spenard

Reputation: 809

GCS deletes are done asynchronously and will complete however long they take, even if that means days or weeks. One of the unfortunate limitations of GCS delete operations is that there is no ETA, and also no information on delete status is available. So Google stresses that best practice using life cycle rules should be followed to take care of deletion operations, and the rest will occur in the background. So as long as you turn off versioning, remove retention policies, and don't have any lifecycle policies preventing immediate deletion of objects, then you should be fine.

So if you set up a proper life cycle policy [1] you can avoid charges on objects in the bucket being deleted, as you are not charged for storage after the object expiration time even if the object is not deleted immediately [2]. So rather than worrying about deleting objects as quickly as possible, it's more important to be concerned with the storage costs of the objects while they are being deleted, as it might take days or even weeks for petabytes of data.

On a related note, there is a limitation with stopping delete operations after they begin, if that ever becomes a need for you. The recommended approach for this is to revoke the delete permission from the principal who invoked the original deletion request. This should cause the ongoing deletion to fail quickly. Once the console indicates that it has failed, it is then safe to reinstate the permissions which were previously revoked. I know this is rather bizarre, but this is what Google Support actually recommends in this regard.

I would also like to point out the Google issue tracker link [3] for bulk deletion issues for large buckets, so you can track the progress and get all the future updates as it progresses.

[1] https://cloud.google.com/storage/docs/lifecycle#behavior

[2] https://cloud.google.com/storage/docs/lifecycle#behavior:~:text=You%20are%20not%20charged%20for%20storage%20after%20the%20object%20expiration%20time%20even%20if%20the%20object%20is%20not%20deleted%20immediately.

[3] https://issuetracker.google.com/issues/35901840

Upvotes: 0

Nealvs
Nealvs

Reputation: 306

Creating a lifecycle rule with a matchesPrefix as the folder name is the best way to remove large folders in a bucket. It does take up to 24 hours to have an effect though. https://cloud.google.com/storage/docs/lifecycle#matchesprefix-suffix

Upvotes: 1

mhouglum
mhouglum

Reputation: 2593

It's just going to take a long time; you have to issue a DELETE request for each object with the prefix FOLDER/.

GCS doesn't have the concept of "folders". Object names can share a common prefix, but they're all in a flat namespace. For example, if you have these three objects:

  • /a/b/c/1.txt
  • /a/b/c/2.txt
  • /a/b/c/3.txt

...then you don't actually have folders named a, b, or c. Once you deleted those three objects, the "folders" (i.e. the prefix that they shared) would no longer appear when you listed objects in your bucket.

See the docs for more details:

https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork

Upvotes: 2

Related Questions