fredrik
fredrik

Reputation: 10281

Fast way of deleting non-empty Google bucket?

Is this my only option or is there a faster way?

# Delete contents in bucket (takes a long time on large bucket)
gsutil -m rm -r gs://my-bucket/*

# Remove bucket
gsutil rb gs://my-bucket/

Upvotes: 31

Views: 22391

Answers (9)

Chris Madden
Chris Madden

Reputation: 2660

I benchmarked deletes using three techniques:

  • Storage Transfer Service: 1200 - 1500 / sec
  • gcloud storage rm: 520 / sec
  • gsutil -m rm: 240 / sec

The big winner is the Storage Transfer Service. To delete files with it you need a source bucket (or folder in a bucket) that is empty, and then you copy that to a destination bucket (or folder in that bucket) that you want to be empty.

If using the GUI select this bullet in the advanced transfer options dialog: Advanced Transfer Options

You can also create and run the job from the CLI. This example assumes you have access to gs://bucket1/empty/ (which has no objects in it) and you want to delete all objects from gs://bucket2/:

gcloud transfer jobs create \
gs://bucket1/empty/ gs://bucket2/ \
--delete-from=destination-if-unique \
--project my-project

If you want your deletes to happen even faster you'll need to create multiple transfer jobs and have them target different sections of the bucket. Because it has to do a bucket listing to find the files to delete you'd want to make the destination paths non-overlapping (e.g. gs://bucket2/folder1/ and gs://bucket2/folder2/, etc). Each job will process in parallel at speed getting the job done in less total time.

Usually I like this better than using Object Lifecycle Management (OLM) because it starts right away (no waiting up to 24 hours for policy evaluation) but there may be times when OLM is the way to go.

Upvotes: 6

Renan Ceratto
Renan Ceratto

Reputation: 150

I've tried both ways (expiration time and gsutil command direct to bucket root), but I could not wait to the expiration time to propagate.

The gsutil rm was deleting 200 files per second, so I did this:

Open several terminal and executed the gsutil rm using different "folder" names with *

ie:

gsutil -m rm -r gs://my-bucket/a*
gsutil -m rm -r gs://my-bucket/b*
gsutil -m rm -r gs://my-bucket/c*

In this example, the command is able to delete 600 files per second. So you just need to open more terminals and find the patterns to delete more files. If one wildcard is huge, you can detail, like this

gsutil -m rm -r gs://my-bucket/b1*
gsutil -m rm -r gs://my-bucket/b2*
gsutil -m rm -r gs://my-bucket/b3*

Upvotes: 0

Travis Hobrla
Travis Hobrla

Reputation: 5511

Buckets are required to be empty before they're deleted. So before you can delete a bucket, you have to delete all of the objects it contains.

You can do this with gsutil rm -r (documentation). Just don't pass the * wildcard and it will delete the bucket itself after it has deleted all of the objects.

gsutil -m rm -r gs://my-bucket

Google Cloud Storage bucket deletes can't succeed until the bucket listing returns 0 objects. If objects remain, you can get a Bucket Not Empty error (or in the UI's case 'Bucket Not Ready') when trying to delete the bucket.

gsutil has built-in retry logic to delete both buckets and objects.

Upvotes: 37

nyet
nyet

Reputation: 596

Shorter one liner for the lifecycle change:

gsutil lifecycle set <(echo '{"rule":[{"action":{"type":"Delete"},"condition":{"age":0}}]}') gs://MY-BUCKET

I've also had good luck creating an empty bucket then starting a transfer to the bucket I want to empty out. Our largest bucket took about an hour to empty this way; the lifecycle method seems to take at least a day.

Upvotes: 3

Ravindranath Akila
Ravindranath Akila

Reputation: 57

Use this to set an appropriate lifecycle rule. e.g. wait for a day.

https://cloud.google.com/storage/docs/gsutil/commands/lifecycle

Example (Read carefully before copy paste)

gsutil lifecycle set [LIFECYCLE_CONFIG_FILE] gs://[BUCKET_NAME]

Example (Read carefully before copy paste)

{
  "rule":
  [
    {
      "action": {"type": "Delete"},
      "condition": {"age": 1}
    }
  ]
}

Then delete the bucket.

This will delete the data asynchronously, so you don't have to keep some background job running on your end.

Upvotes: 5

Tadas Šubonis
Tadas Šubonis

Reputation: 1600

This deserves to be summarized and pointed out.

Deleting with gsutil rm is slow if you have LOTS (terabytes) of data

gsutil -m rm -r gs://my-bucket

However, you can specify the expiration for the bucket and let the GCS do the work for you. Create a fast-delete.json policy:

{
   "rule":[
      {
         "action":{
            "type":"Delete"
         },
         "condition":{
            "age":0
         }
      }
   ]
}

then apply

gsutil lifecycle set fast-delete.json gs://MY-BUCKET

Thanks, @jterrace and @Janosch

Upvotes: 7

jterrace
jterrace

Reputation: 67073

Another option is to enable Lifecycle Management on the bucket. You could specify an Age of 0 days and then wait a couple days. All of your objects should be deleted.

Upvotes: 35

Chege
Chege

Reputation: 359

Using Python client, you can force a delete within your script by using:

bucket.delete(force=True)

Try out a similar thing in your current language.

Github thread that discusses this

Upvotes: 8

Antxon
Antxon

Reputation: 1943

Remove the bucket from Developers Console. It will ask for confirmation before deleting a non empty bucket. It works like a charm ;)

Upvotes: -1

Related Questions