jz22
jz22

Reputation: 2638

Can't get total size of a bucket with Boto3

I'm trying to get the total size of a bucket. However total_size returns 0. Of course there are a couple of files in the bucket. If I have five files in my bucket the following function prints five zeros. What am I doing wrong?

bucket = boto3.resource('s3', config=Config(signature_version="s3", s3={'addressing_style': 'path'})).Bucket(name)
for object in bucket.objects.all():
    total_size += object.size
    print(object.size)

Upvotes: 6

Views: 16540

Answers (7)

dsciacca8930
dsciacca8930

Reputation: 15

Here's my solution, similar to @Rohit G's except it accounts for list_objects being deprecated in preference for list_objects_v2 and that list_objects_v2 returns a max of 1000 keys (this is the same behavior as list_objects, so @Rohit G's solution, if used, should be updated to consider this - source).

I also included logic for specifying a prefix should anyone want to get just the size of a particular prefix in the bucket, but using as written will get the size of the entire bucket:

import boto3
s3 = boto3.client('s3')
bucket= 'myBucket'
prefix = ''
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
total_size = sum([obj.get('Size') for obj in resp.get('Contents')])
while resp.get('NextContinuationToken'):
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, ContinuationToken=resp.get('NextContinuationToken'))
    total_size += sum([obj.get('Size') for obj in resp.get('Contents')])
print(f"Size (bytes): {total_size}")

Upvotes: 0

Ori N
Ori N

Reputation: 748

I wrote a python function which returns the bucket size using a daily metric stored in cloudwatch:

def get_bucket_size(bucket_name: str, region: str):
    cloudwatch = boto3.client("cloudwatch", region_name=region)
    result = cloudwatch.get_metric_statistics(
        Namespace="AWS/S3",
        Dimensions=[{"Name": "BucketName", "Value": bucket_name},
                    {"Name": "StorageType", "Value": "StandardStorage"}],
        MetricName="BucketSizeBytes",
        StartTime=datetime.now() - timedelta(2),
        EndTime=datetime.now(),
        Period=86400,
        Statistics=['Average'],
    )
    return result["Datapoints"][0]["Average"]

Upvotes: 1

LucyDrops
LucyDrops

Reputation: 609

You can use this to get the size in GB:

import boto3
s3 = boto3.resource('s3')
bytes = sum([object.size for object in s3.Bucket('myBucket').objects.all()])
print(f'total bucket size: {bytes//1000/1024/1024} GB')

Upvotes: 1

Rohit G
Rohit G

Reputation: 107

I am using this:

s3client = boto3.client('s3', region_name=region,
                            aws_access_key_id=access_key,
                            aws_secret_access_key=secret_key)
response = s3client.list_objects(Bucket=bucket_name)['Contents']
bucket_size = sum(obj['Size'] for obj in response)

Upvotes: 2

John Rotenstein
John Rotenstein

Reputation: 269091

A simpler alternative is to use Amazon S3 Inventory to dump a list of objects on a daily basis, then calculate the totals from that.

Upvotes: 0

John Hanley
John Hanley

Reputation: 81336

Change signature_version="s3" to signature_version="s3v4".

I also like helloV's answer.

Also specify the region for the bucket instead of relying on the default configuration.

Upvotes: 1

helloV
helloV

Reputation: 52375

I see few issues:

  • Not sure about your call to boto3.resource(). Is that correct?
  • total_size not initialized

Try this:

total_size = 0
bucket = boto3.resource('s3').Bucket('mybucket')
for object in bucket.objects.all():
  total_size += object.size
  print(object.size)
print(total_size)

Or a one liner:

sum([object.size for object in boto3.resource('s3').Bucket('mybucket').objects.all()])

Upvotes: 11

Related Questions