Reputation: 469
I have a large bucket (PiB) and I'm interested in running some regex queries to understand how many bytes certain paths take.
gsutil du -s -a gs://....
works well at a small scale, but I have two questions:
gsutil du
Upvotes: 1
Views: 1673
Reputation: 1
To answer your question 2. Is there an associated cost for running this command on my bucket?
, the answer is yes.
I was charged $20 today in the category of Class A Operations
, and the only thing I did was uploading the files to my bucket and check the bucket size using gsutil du -s
.
They explicitly mentioned this in their document:
Caution: The gsutil du command calculates the current space usage by making a series of object listing requests, which can take a long time for large buckets. If the number of objects in your bucket is hundreds of thousands or more, or if you want to monitor your bucket size over time, use Monitoring instead, as described in the Console tab.
Upvotes: 0
Reputation: 75745
With Cloud Storage, you can't search for object based on regex, only based on a prefix. If you want a regex, you have to mirror the file name elsewhere and search for the pattern that you want.
How to mirror? you have to do it by yourselves :(
About gsutil du
command, it's pretty simple: the gsutil binary query Cloud Storage API to get list the file. In that API response, the File metadata are present (especially the file size) and gsutil aggregate the results, i.e. 1 Class a operation call per 1000 files (max page size)
Upvotes: -1
Reputation: 966
I think gsutil du, is the tool you might use for this analysis. There is no faster way to do it.
But if you need to do it regularly, you may need to enable bucket logging:
You can read more about it, here: https://cloud.google.com/storage/docs/access-logs#delivery
Although about the cost, It counts as a class B operation
https://cloud.google.com/storage/pricing
Upvotes: 1