user38643
user38643

Reputation: 469

GCS cost for running gsutil du?

I have a large bucket (PiB) and I'm interested in running some regex queries to understand how many bytes certain paths take.

gsutil du -s -a gs://.... works well at a small scale, but I have two questions:

  1. Is there a better way to analyze size for redundant paths in GCS that isn't gsutil du
  2. Is there an associated cost for running this command on my bucket?

Upvotes: 1

Views: 1673

Answers (3)

phduc.06
phduc.06

Reputation: 1

To answer your question 2. Is there an associated cost for running this command on my bucket?, the answer is yes.

I was charged $20 today in the category of Class A Operations, and the only thing I did was uploading the files to my bucket and check the bucket size using gsutil du -s.

They explicitly mentioned this in their document:

Caution: The gsutil du command calculates the current space usage by making a series of object listing requests, which can take a long time for large buckets. If the number of objects in your bucket is hundreds of thousands or more, or if you want to monitor your bucket size over time, use Monitoring instead, as described in the Console tab.

Upvotes: 0

guillaume blaquiere
guillaume blaquiere

Reputation: 75745

With Cloud Storage, you can't search for object based on regex, only based on a prefix. If you want a regex, you have to mirror the file name elsewhere and search for the pattern that you want.

How to mirror? you have to do it by yourselves :(

About gsutil du command, it's pretty simple: the gsutil binary query Cloud Storage API to get list the file. In that API response, the File metadata are present (especially the file size) and gsutil aggregate the results, i.e. 1 Class a operation call per 1000 files (max page size)

Upvotes: -1

Chaotic Pechan
Chaotic Pechan

Reputation: 966

I think gsutil du, is the tool you might use for this analysis. There is no faster way to do it.

But if you need to do it regularly, you may need to enable bucket logging:

You can read more about it, here: https://cloud.google.com/storage/docs/access-logs#delivery

Although about the cost, It counts as a class B operation

https://cloud.google.com/storage/pricing

Upvotes: 1

Related Questions