Reputation: 159
Assume there is bucket with a folder root, it has subfolders and files. Is there any way to get the total files count and total size of the root folder?
What I tried:
With gsutil du
I'm getting the size quickly but won't the get count. With gsutil ls ___
I'm getting list and size, if I pipe it with awk and sum them. I might get the expected result but ls itself is taking lot of time.
So is there a better/faster way to handle this?
Upvotes: 4
Views: 8208
Reputation: 3974
gsutil du -sh
, which could be a good idea for small directories.For big directories, I am not able to get a result, even after a few hours, but only the following retrying message:
gsutil ls
which is more efficient.For big directories, it could take tens of minutes, but at least it complete.
To retrieve the number of files and the total size of a directory with gsutil ls
, you can use the following command:
gsutil ls -l gs://bucket/dir/** | awk '{size+=$1} END {print "nb_files:", NR, "\ntotal_size:",size,"B"}'
Then divide the value by:
Upvotes: 0
Reputation: 2593
Doing an object listing of some sort is the way to go - both the ls
and du
commands in gsutil perform object listing API calls under the hood.
If you want to get a summary of all objects in a bucket, check Cloud Monitoring (as mentioned in the docs). But, this isn't applicable if you want statistics for a subset of objects - GCS doesn't support actual "folders", so all your objects under the "folder" foo
are actually just objects named with a common prefix, foo/
.
If you want to analyze the number of objects under a given prefix, you'll need to perform object listing API calls (either using a client library or using gsutil). The listing operations can only return so many objects per response and thus are paginated, meaning you'll have to make several calls if you have lots of objects under the desired prefix. The max number of results per listing call is currently 1,000. So as an example, if you had 200,000 objects to list, you'd have to make 200 sequential API calls.
ls
:There are several scenarios in which gsutil can do "extra" work when completing an ls
command, like when doing a "long" listing using the -L
flag or performing recursive listings using the -r
flag. To save time and perform the fewest number of listings possible in order to obtain a total count of bytes under some prefix, you'll want to do a "flat" listing using gsutil's wildcard support, e.g.:
gsutil ls -l gs://my-bucket/some-prefix/**
Alternatively, you could try writing a script using one of the GCS client libraries, like the Python library and its list_blobs functionality.
Upvotes: 3
Reputation: 38389
If you want to track the count of objects in a bucket over a long time, Cloud Monitoring offers the metric "storage/object_count". The metric updates about once per day, which makes it more useful for long-term trends.
As for counting instantaneously, unfortunately gsutil ls
is probably your best bet.
Upvotes: 2