Reputation: 2638
I'm trying to get the total size of a bucket. However total_size returns 0. Of course there are a couple of files in the bucket. If I have five files in my bucket the following function prints five zeros. What am I doing wrong?
bucket = boto3.resource('s3', config=Config(signature_version="s3", s3={'addressing_style': 'path'})).Bucket(name)
for object in bucket.objects.all():
total_size += object.size
print(object.size)
Upvotes: 6
Views: 16540
Reputation: 15
Here's my solution, similar to @Rohit G's except it accounts for list_objects
being deprecated in preference for list_objects_v2
and that list_objects_v2
returns a max of 1000 keys (this is the same behavior as list_objects
, so @Rohit G's solution, if used, should be updated to consider this - source).
I also included logic for specifying a prefix should anyone want to get just the size of a particular prefix in the bucket, but using as written will get the size of the entire bucket:
import boto3
s3 = boto3.client('s3')
bucket= 'myBucket'
prefix = ''
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
total_size = sum([obj.get('Size') for obj in resp.get('Contents')])
while resp.get('NextContinuationToken'):
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, ContinuationToken=resp.get('NextContinuationToken'))
total_size += sum([obj.get('Size') for obj in resp.get('Contents')])
print(f"Size (bytes): {total_size}")
Upvotes: 0
Reputation: 748
I wrote a python function which returns the bucket size using a daily metric stored in cloudwatch:
def get_bucket_size(bucket_name: str, region: str):
cloudwatch = boto3.client("cloudwatch", region_name=region)
result = cloudwatch.get_metric_statistics(
Namespace="AWS/S3",
Dimensions=[{"Name": "BucketName", "Value": bucket_name},
{"Name": "StorageType", "Value": "StandardStorage"}],
MetricName="BucketSizeBytes",
StartTime=datetime.now() - timedelta(2),
EndTime=datetime.now(),
Period=86400,
Statistics=['Average'],
)
return result["Datapoints"][0]["Average"]
Upvotes: 1
Reputation: 609
You can use this to get the size in GB:
import boto3
s3 = boto3.resource('s3')
bytes = sum([object.size for object in s3.Bucket('myBucket').objects.all()])
print(f'total bucket size: {bytes//1000/1024/1024} GB')
Upvotes: 1
Reputation: 107
I am using this:
s3client = boto3.client('s3', region_name=region,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)
response = s3client.list_objects(Bucket=bucket_name)['Contents']
bucket_size = sum(obj['Size'] for obj in response)
Upvotes: 2
Reputation: 269091
A simpler alternative is to use Amazon S3 Inventory to dump a list of objects on a daily basis, then calculate the totals from that.
Upvotes: 0
Reputation: 81336
Change signature_version="s3" to signature_version="s3v4".
I also like helloV's answer.
Also specify the region for the bucket instead of relying on the default configuration.
Upvotes: 1
Reputation: 52375
I see few issues:
boto3.resource()
. Is that correct?total_size
not initializedTry this:
total_size = 0
bucket = boto3.resource('s3').Bucket('mybucket')
for object in bucket.objects.all():
total_size += object.size
print(object.size)
print(total_size)
Or a one liner:
sum([object.size for object in boto3.resource('s3').Bucket('mybucket').objects.all()])
Upvotes: 11