ycseattle
ycseattle

Reputation: 3975

Check file size on S3 without downloading?

I have customer files uploaded to Amazon S3, and I would like to add a feature to count the size of those files for each customer. Is there a way to "peek" into the file size without downloading them? I know you can view from the Amazon control panel but I need to do it pro grammatically.

Upvotes: 112

Views: 153498

Answers (20)

Utsav10
Utsav10

Reputation: 629

AWS Java SDK v2.x version

Use dependency

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>[2.27.2, 3.0.0)</version>
</dependency>

You can use the s3Client's headObject api as following,

public Long fetchObjectContentLength(String key) {
    HeadObjectResponse headObjectResponse =
        s3Client.headObject(builder -> builder.bucket(bucket).key(key));
    return Objects.nonNull(headObjectResponse) ? headObjectResponse.contentLength() : 0L;
  }

Upvotes: 0

gil.fernandes
gil.fernandes

Reputation: 14621

This is a solution for whoever is using Java and the S3 Java library provided by Amazon.

If you are using com.amazonaws.services.s3.AmazonS3 you can use a GetObjectMetadataRequest request which allows you to query the object length.

The libraries you have to use are:

<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-s3</artifactId>
    <version>1.11.511</version>
</dependency>

Imports:

import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;

And the code you need to get the content length:

GetObjectMetadataRequest metadataRequest = new GetObjectMetadataRequest(bucketName, fileName);
final ObjectMetadata objectMetadata = s3Client.getObjectMetadata(metadataRequest);
long contentLength = objectMetadata.getContentLength();

Before you can execute the code above, you will need to build the S3 client. Here is some example code for that:

AWSCredentials credentials = new BasicAWSCredentials(
            accessKey,
            secretKey
);
s3Client = AmazonS3ClientBuilder.standard()
            .withRegion(clientRegion)
            .withCredentials(new AWSStaticCredentialsProvider(credentials))
            .build();

Upvotes: 11

Mykola Melnyk
Mykola Melnyk

Reputation: 11

This is how I did it in Java AWS SDK v2.x

Hope this helps.

Region region = Region.EU_CENTRAL_1;
        S3Client s3client = S3Client.builder().region(region).build();

        String bucket = "s3-demo";

        HeadObjectRequest headObjectRequest = HeadObjectRequest.builder()
                .bucket(bucket)
                .key(fileName)
                .build();
        HeadObjectResponse headObjectResponse = s3client.headObject(headObjectRequest);
        fileSize = headObjectResponse.contentLength();

Upvotes: 1

enharmonic
enharmonic

Reputation: 2118

If you are looking to do this with a single file, you can use aws s3api head-object to get the metadata only without downloading the file itself:

$ aws s3api head-object --bucket mybucket --key path/to/myfile.csv --query "ContentLength"

Explanation

  • s3api head-object retrieves the object metadata in json format
  • --query "ContentLength" filters the json response to get the size of the body in bytes

Upvotes: 6

Jonny Rimek
Jonny Rimek

Reputation: 1171

Golang example, same principle, run head request again the object in question:

func returnKeySizeInMB(bucketName string, key string) {
    output, err := svc.HeadObject(
        &s3.HeadObjectInput{
            Bucket: aws.String(bucketName),
            Key:    aws.String(key),
        })
    if err != nil {
        log.Fatalf("Unable to to send head request to item %q, %v", e.Detail.RequestParameters.Key, err)
    }

    return int(*output.ContentLength / 1024 / 1024)
}

Here, the parameter key means the path to the file.

For eg, if the URI of the file is S3://my-personal-bucket/folder1/subfolder1/myfile.pdf, then the syntax would look like:

output, err := svc.HeadObject(
        &s3.HeadObjectInput{
            Bucket: aws.String("my-personal-bucket"),
            Key:    aws.String("folder1/subfolder1/myfile.pdf"),
        })

Upvotes: 0

Marcin
Marcin

Reputation: 238797

These days you could also use Amazon S3 Inventory which gives you:

Size – The object size in bytes.

Upvotes: 0

Denis Viunyk
Denis Viunyk

Reputation: 121

If the file is a private one, we can get the header by SDK.

PHP example:

$head = $client->headObject(
 [
   'Bucket' => $bucket,
   'Key' => $key,
 ]
);
$result = (int) ($head->get('ContentLength') ?? 0);

Upvotes: 1

kli
kli

Reputation: 501

Ruby solution with head_object:

require 'aws-sdk-s3'

s3 = Aws::S3::Client.new(
  region:               'us-east-1',     #or any other region
  access_key_id:        AWS_ACCESS_KEY_ID,
  secret_access_key:    AWS_SECRET_ACCESS_KEY
)

res = s3.head_object(bucket: bucket_name, key: object_key)
file_size = res[:content_length]

Upvotes: 1

Atom
Atom

Reputation: 11

Aws C++ solution to get file size

//! Step 1: create s3 client
Aws::S3::S3Client s3Client(cred, config); //!Used cred & config,You can use other options.

//! Step 2: Head Object request
Aws::S3::Model::HeadObjectRequest headObj;
headObj.SetBucket(bucket);
headObj.SetKey(key);

//! Step 3: read size from object header metadata
auto object = s3Client.HeadObject(headObj);
if (object.IsSuccess())
{
    fileSize = object.GetResultWithOwnership().GetContentLength();
}
else
{
    std::cout << "Head Object error: "
        << object .GetError().GetExceptionName() << " - "
        << object .GetError().GetMessage() << std::endl;
}

Note: Do not use GetObject to extract size, It reads file to extract information.

Upvotes: 1

Stephen C
Stephen C

Reputation: 791

.NET AWS SDK ---- ListObjectsRequest, ListObjectsResponse, S3Object

AmazonS3Client s3 = new AmazonS3Client();
SpaceUsed(s3, "putBucketNameHere");

static void SpaceUsed(AmazonS3Client s3Client, string bucketName)
    {
        ListObjectsRequest request = new ListObjectsRequest();
        request.BucketName = bucketName;
        ListObjectsResponse response = s3Client.ListObjects(request);
        long totalSize = 0;
        foreach (S3Object o in response.S3Objects)
        {
            totalSize += o.Size;
        }
        Console.WriteLine("Total Size of bucket " + bucketName + " is " +
            Math.Round(totalSize / 1024.0 / 1024.0, 2) + " MB");
    }

Upvotes: 5

ninhjs.dev
ninhjs.dev

Reputation: 8603

Node.js example:

const AWS = require('aws-sdk');
const s3 = new AWS.S3();

function sizeOf(key, bucket) {
    return s3.headObject({ Key: key, Bucket: bucket })
        .promise()
        .then(res => res.ContentLength);
}


// A test
sizeOf('ahihi.mp4', 'output').then(size => console.log(size));

Doc is here.

Upvotes: 63

Kyle Bridenstine
Kyle Bridenstine

Reputation: 6393

You can simply use the s3 ls command:

aws s3 ls s3://mybucket --recursive --human-readable --summarize

Outputs

2013-09-02 21:37:53   10 Bytes a.txt
2013-09-02 21:37:53  2.9 MiB foo.zip
2013-09-02 21:32:57   23 Bytes foo/bar/.baz/a
2013-09-02 21:32:58   41 Bytes foo/bar/.baz/b
2013-09-02 21:32:57  281 Bytes foo/bar/.baz/c
2013-09-02 21:32:57   73 Bytes foo/bar/.baz/d
2013-09-02 21:32:57  452 Bytes foo/bar/.baz/e
2013-09-02 21:32:57  896 Bytes foo/bar/.baz/hooks/bar
2013-09-02 21:32:57  189 Bytes foo/bar/.baz/hooks/foo
2013-09-02 21:32:57  398 Bytes z.txt

Total Objects: 10
   Total Size: 2.9 MiB

Reference: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html

Upvotes: 68

matt2000
matt2000

Reputation: 1073

I do something like this in Python to get the cumulative size of all files under a given prefix:

import boto3

bucket = 'your-bucket-name'
prefix = 'some/s3/prefix/'

s3 = boto3.client('s3')

size = 0

result = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
size += sum([x['Size'] for x in result['Contents']])

while result['IsTruncated']:
    result = s3.list_objects_v2(
        Bucket=bucket, Prefix=prefix,
        ContinuationToken=result['NextContinuationToken'])
    size += sum([x['Size'] for x in result['Contents']])

print('Total size in MB: ' + str(size / (1000**2)))

Upvotes: 6

tahir siddiqui
tahir siddiqui

Reputation: 1621

The following python code will provide the size of top 1000 files printing them individually from s3:

import boto3

bucket = 'bucket_name'
prefix = 'prefix'

s3 = boto3.client('s3')
contents = s3.list_objects_v2(Bucket=bucket,  MaxKeys=1000, Prefix=prefix)['Contents']

for c in contents:
    print('Size (KB):', float(c['Size'])/1000)

Upvotes: 1

Ronak Patel
Ronak Patel

Reputation: 3444

There is better solution.

$info = $s3->getObjectInfo($yourbucketName, $yourfilename);
print $info['size'];

Upvotes: 1

hannunehg
hannunehg

Reputation: 327

Android Solution

Integrate aws sdk and you get a pretty much straight forward solution:

// ... put this in background thread
List<S3ObjectSummary> s3ObjectSummaries;
s3ObjectSummaries = s3.listObjects(registeredBucket).getObjectSummaries();
for (int i = 0; i < s3ObjectSummaries.size(); i++) {
    S3ObjectSummary s3ObjectSummary = s3ObjectSummaries.get(i);
    Log.d(TAG, "doInBackground: size " + s3ObjectSummary.getSize());
}
  • Here is a link to the official documentation.
  • Very important to execute the code in AsyncTask or any means to get you in a background thread, otherwise you get an exception for running network on ui thread.

Upvotes: 1

Ludo - Off the record
Ludo - Off the record

Reputation: 5543

PHP code to check s3 object size (or any other object headers), notice the use stream_context_set_default to make sure it only uses a HEAD request

stream_context_set_default(
            array(
                'http' => array(
                    'method' => 'HEAD'
                )
            )
        );

$headers = get_headers('http://s3.amazonaws.com/bucketname/filename.jpg', 1);
$headers = array_change_key_case($headers); 

$size = trim($headers['content-length'],'"'); 

Upvotes: 0

LennonR
LennonR

Reputation: 1463

Using Michael's advice, my successful code looked like this:

require 'net/http'
require 'uri'

file_url = MyObject.first.file.url

url = URI.parse(file_url)
req = Net::HTTP::Head.new url.path
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request(req)
}

file_length = res["content-length"]

Upvotes: 7

Ryan Parman
Ryan Parman

Reputation: 6945

You can also do a listing of the contents of the bucket. The metadata in the listing contains the file sizes of all of the objects. This is how it's implemented in the AWS SDK for PHP.

Upvotes: 1

Michael Dowling
Michael Dowling

Reputation: 5266

Send an HTTP HEAD request to the object. A HEAD request will retrieve the same HTTP headers as a GET request, but it will not retrieve the body of the object (saving you bandwidth). You can then parse out the Content-Length header value from the HTTP response headers.

Upvotes: 85

Related Questions