fredrik
fredrik

Reputation: 10291

Fastest way to get Google Storage bucket size?

I'm currently doing this, but it's VERY slow since I have several terabytes of data in the bucket:

gsutil du -sh gs://my-bucket-1/

And the same for a sub-folder:

gsutil du -sh gs://my-bucket-1/folder

Is it possible to somehow obtain the total size of a complete bucket (or a sub-folder) elsewhere or in some other fashion which is much faster?

Upvotes: 65

Views: 54127

Answers (10)

red888
red888

Reputation: 31662

The visibility for Google Storage here is pretty poor.

The fastest way is actually to pull the stackdriver metrics and look at the total size in bytes:

enter image description here

Unfortunately there is practically no filtering you can do in stackdriver. You can't wildcard the bucket name and the almost useless bucket resource labels are NOT aggregate-able in stack driver metrics

Also this is bucket level only, not prefixes.

The SD metrics are updated daily so unless you can wait a day you can't use this to get the current size right now.

Update

Stack Driver metrics now support user metadata labels so you can label your GCS buckets and aggregate those metrics by custom labels you apply.

Edit

I want to add a word of warning if you are creating monitors off of this metric. There is a really serious bug with this metric right now.

GCP occasionally has platform issues that cause this metric to stop getting written. And I think it's tenant specific (maybe?) so you also won't see it on their public health status pages. And it seems poorly documented for their internal support staff as well because every time we open a ticket to complain they seem to think we are lying and it takes some back and forth before they even acknowledge it's broken.

I think this happens if you have many buckets and something crashes on their end and stops writing metrics to your projects. While it does not happen all the time we see it several times a year.

For example it just happened to us again. This is what I'm seeing in stack driver right now across all our projects:

enter image description here

Response from GCP support

Just adding the last response we got from GCP support during this most recent metric outage. I'll add all our buckets were accessible it was just this metric was not being written:

The product team concluded their investigation stating that this was indeed a widespread issue, not tied to your projects only. This internal issue caused unavailability for some GCS buckets, which was affecting the metering systems directly, thus the reason why the "GCS Bucket Total Bytes" metric was not available.

Upvotes: 42

Vadiraj k.s
Vadiraj k.s

Reputation: 59

I guess, rendering metric from gcp is better approach then using gsutil for getting bucket size.

#!/bin/bash
PROJECT_ID='<<PROJECT_ID>>'
ACCESS_TOKEN="$(gcloud auth print-access-token)"
CHECK_TIME=10
STARTTIME=$(date --date="${CHECK_TIME} minutes ago" -u +"%Y-%m-%dT%H:%M:%SZ")
ENDTIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
FILTER="$( echo -n 'metric.type="storage.googleapis.com/storage/total_bytes"' | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
START="$( echo -n "${STARTTIME}" | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
END="$( echo -n "${ENDTIME}" | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
DETAILS=$(curl -s -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  "https://monitoring.googleapis.com/v3/projects/${PROJECT_ID}/timeSeries/?filter=${FILTER}&interval.startTime=${START}&interval.endTime=${END}")
for i in $(echo "$DETAILS" | jq -r ".timeSeries[]|[.resource.labels.bucket_name,.resource.labels.location,.metric.labels.storage_class,.points[0].value.doubleValue]|@csv"|sort -t, -n -k4,4nr ); do
  f1=${i%,*}
  f2=${i##*,} 
  size=$(numfmt --to=iec-i --suffix=B --format="%9.2f" $f2)
  echo $f1,$size
done 

Upvotes: 0

Vitek
Vitek

Reputation: 71

  • to include files in subfolders gsutil ls -l -R gs://${bucket_name}
  • This calculates size of all files in all buckets for bucket_name in $(gcloud storage buckets list "--format=value(name)"); do echo "$bucket_name;$(gsutil ls -l -R gs://${bucket_name})"; done | grep TOTAL | awk '{s+=$4}END{print s/1024/1024/1024/1024}'

Upvotes: 1

dan carter
dan carter

Reputation: 4361

Use the built in dashboard Operations -> Monitoring -> Dashboards -> Cloud Storage

The graph at the bottom shows the bucket size for all buckets, or you can select an individual bucket to drill down.

Note that the metric is only updated once per day.

object size graph

Upvotes: 8

Sander van den Oord
Sander van den Oord

Reputation: 12838

With python you can get the size of your bucket as follows:

from google.cloud import storage

storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_or_name='name_of_your_bucket')

blobs_total_size = 0
for blob in blobs:
    blobs_total_size += blob.size  # size in bytes

blobs_total_size / (1024 ** 3)  # size in GB

Upvotes: 3

NAW
NAW

Reputation: 41

Google Console

Platform -> Monitoring -> Dashboard -> Select the bucket

Scroll down can see the object size for that bucket

Upvotes: 4

Anton Kumpan
Anton Kumpan

Reputation: 344

For me following command helped:

gsutil ls -l gs://{bucket_name}

It then gives output like this after listing all files:

TOTAL: 6442 objects, 143992287936 bytes (134.1 GiB)

Upvotes: 0

needcaffeine
needcaffeine

Reputation: 11

I found that that using the CLI it was frequently timing out. But that my be as I was reviewing a coldline storage.

For a GUI solution. Look at Cloudberry Explorer

GUI view of storage

Upvotes: 1

Mike Schwartz
Mike Schwartz

Reputation: 12155

If the daily storage log you get from enabling bucket logging (per Brandon's suggestion) won't work for you, one thing you could do to speed things up is to shard the du request. For example, you could do something like:

gsutil du -s gs://my-bucket-1/a* > a.size &
gsutil du -s gs://my-bucket-1/b* > b.size &
...
gsutil du -s gs://my-bucket-1/z* > z.size &
wait
awk '{sum+=$1} END {print sum}' *.size

(assuming your subfolders are named starting with letters of the English alphabet; if not; you'd need to adjust how you ran the above commands).

Upvotes: 17

Brandon Yarbrough
Brandon Yarbrough

Reputation: 38399

Unfortunately, no. If you need to know what size the bucket is right now, there's no faster way than what you're doing.

If you need to check on this regularly, you can enable bucket logging. Google Cloud Storage will generate a daily storage log that you can use to check the size of the bucket. If that would be useful, you can read more about it here: https://cloud.google.com/storage/docs/accesslogs#delivery

Upvotes: 25

Related Questions