Reputation: 4653
Unless I'm missing something, it seems that none of the APIs I've looked at will tell you how many objects are in an <S3 bucket>/<folder>
. Is there any way to get a count?
Upvotes: 248
Views: 305043
Reputation: 111
With AWS CLI updates and CloudWatch changes the CLI syntax that works for me (as of April 2023) is:
aws --profile cloudwatch get-metric-statistics --namespace AWS/S3
--metric-name NumberOfObjects
--dimensions Name=BucketName,Value= Name=StorageType,Value=AllStorageTypes
--start-time --end-time --period 86400 --statistic Average
Since the S3 stats are 24 hour data points you have to use start and end time that are Days apart and the period is 86400. You can pull a series of data, but CW will return it in random order therefore, add
--query 'sort_by (Datapoints, &Timestamp)'
to the end of the command to get the results sorted in order..
aws --profile cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name NumberOfObjects
--dimensions Name=BucketName,Value= Name=StorageType,Value=AllStorageTypes
--start-time --end-time --period 86400 --statistic Average --query 'sort_by (Datapoints, &Timestamp)'
Upvotes: 3
Reputation: 965
you can use the below command in the command line provided that you replace the bucket path as it is a template (using default profile or add --profile {aws_profile})
aws s3 ls s3://{bucket}/{folder} --recursive --no-paginate --summarize
the point is you have to have the --summarize option in so that it will print out both the total size & the number of objects in the end, also don't forget to disable the pagination using --no-paginate since you want to have this calculation for the whole bucket/folder
Upvotes: 1
Reputation: 22332
Look at Metrics tab on your bucket
or:
Look at AWS Cloudwatch's metrics
or:
aws s3api list-objects --bucket <BUCKET_NAME> --prefix "<FOLDER_NAME>" | wc -l
or:
aws s3 ls s3://<BUCKET_NAME>/<FOLDER_NAME>/ --recursive --summarize --human-readable | grep "Total Objects"
or with s4cmd:
s4cmd ls -r s3://<BUCKET_NAME>/<FOLDER_NAME>/ | wc -l
aws s3api list-objects --bucket <BUCKET_NAME> --output json --query "[sum(Contents[].Size), length(Contents[])]" | awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
or:
aws s3 ls s3://<BUCKET_NAME>/<FOLDER_NAME>/ --recursive --summarize --human-readable | grep "Total Size"
or with s4cmd:
s4cmd du s3://<BUCKET_NAME>
or with CloudWatch metrics:
aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --namespace AWS/S3 --start-time 2020-10-20T16:00:00Z --end-time 2020-10-22T17:00:00Z --period 3600 --statistics Average --unit Bytes --dimensions Name=BucketName,Value=<BUCKET_NAME> Name=StorageType,Value=StandardStorage --output json | grep "Average"
Upvotes: 22
Reputation: 1149
Here's the boto3 version of the python script embedded above.
import sys
import boto3
s3 = boto3.resource("s3")
s3bucket = s3.Bucket(sys.argv[1])
size = 0
totalCount = 0
for key in s3bucket.objects.all():
totalCount += 1
size += key.size
print("total size:")
print("%.3f GB" % (size * 1.0 / 1024 / 1024 / 1024))
print("total count:")
print(totalCount)
Upvotes: 1
Reputation: 4025
There is a --summarize
switch that shows bucket summary information (i.e. number of objects, total size).
Here's the correct answer using AWS cli:
aws s3 ls s3://bucketName/path/ --recursive --summarize | grep "Total Objects:"
Total Objects: 194273
See the documentation
Upvotes: 216
Reputation: 1195
The issue @Mayank Jaiswal mentioned about using cloudwatch metrics should not actually be an issue. If you aren't getting results, your range just might not be wide enough. It's currently Nov 3, and I wasn't getting results no matter what I tried. I went to the s3 bucket and looked at the counts and the last record for the "Total number of objects" count was Nov 1.
So here is how the cloudwatch solution looks like using javascript aws-sdk:
import aws from 'aws-sdk';
import { startOfMonth } from 'date-fns';
const region = 'us-east-1';
const profile = 'default';
const credentials = new aws.SharedIniFileCredentials({ profile });
aws.config.update({ region, credentials });
export const main = async () => {
const cw = new aws.CloudWatch();
const bucket_name = 'MY_BUCKET_NAME';
const end = new Date();
const start = startOfMonth(end);
const results = await cw
.getMetricStatistics({
// @ts-ignore
Namespace: 'AWS/S3',
MetricName: 'NumberOfObjects',
Period: 3600 * 24,
StartTime: start.toISOString(),
EndTime: end.toISOString(),
Statistics: ['Average'],
Dimensions: [
{ Name: 'BucketName', Value: bucket_name },
{ Name: 'StorageType', Value: 'AllStorageTypes' },
],
Unit: 'Count',
})
.promise();
console.log({ results });
};
main()
.then(() => console.log('Done.'))
.catch((err) => console.error(err));
Notice two things:
Upvotes: 2
Reputation: 6589
This information is now surfaced in the AWS dashboard. Simply navigate to the bucket and click the Metrics tab.
Upvotes: 16
Reputation:
Select the bucket/Folder-> Click on actions -> Click on Calculate Total Size
Upvotes: 3
Reputation: 1853
As of November 18, 2020 there is now an easier way to get this information without taxing your API requests:
The default, built-in, free dashboard allows you to see the count for all buckets, or individual buckets under the "Buckets" tab. There are many drop downs to filter and sort almost any reasonable metric you would look for.
Upvotes: 6
Reputation: 400
One of the simplest ways to count number of objects in s3 is:
Step 1: Select root folder
Step 2: Click on Actions -> Delete (obviously, be careful - don't delete it)
Step 3: Wait for a few mins aws will show you number of objects and its total size.
Upvotes: 5
Reputation: 35
aws s3 ls s3://bucket-name/folder-prefix-if-any --recursive | wc -l
Upvotes: 1
Reputation: 4262
Following is how you can do it using java client.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.519</version>
</dependency>
import com.amazonaws.ClientConfiguration;
import com.amazonaws.Protocol;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.ObjectListing;
public class AmazonS3Service {
private static final String S3_ACCESS_KEY_ID = "ACCESS_KEY";
private static final String S3_SECRET_KEY = "SECRET_KEY";
private static final String S3_ENDPOINT = "S3_URL";
private AmazonS3 amazonS3;
public AmazonS3Service() {
ClientConfiguration clientConfiguration = new ClientConfiguration();
clientConfiguration.setProtocol(Protocol.HTTPS);
clientConfiguration.setSignerOverride("S3SignerType");
BasicAWSCredentials credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_KEY);
AWSStaticCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(credentials);
AmazonS3ClientBuilder.EndpointConfiguration endpointConfiguration = new AmazonS3ClientBuilder.EndpointConfiguration(S3_ENDPOINT, null);
amazonS3 = AmazonS3ClientBuilder.standard().withCredentials(credentialsProvider).withClientConfiguration(clientConfiguration)
.withPathStyleAccessEnabled(true).withEndpointConfiguration(endpointConfiguration).build();
}
public int countObjects(String bucketName) {
int count = 0;
ObjectListing objectListing = amazonS3.listObjects(bucketName);
int currentBatchCount = objectListing.getObjectSummaries().size();
while (currentBatchCount != 0) {
count += currentBatchCount;
objectListing = amazonS3.listNextBatchOfObjects(objectListing);
currentBatchCount = objectListing.getObjectSummaries().size();
}
return count;
}
}
Upvotes: 0
Reputation: 8032
If you're looking for specific files, let's say .jpg
images, you can do the following:
aws s3 ls s3://your_bucket | grep jpg | wc -l
Upvotes: 0
Reputation: 61
You can easily get the total count and the history if you go to the s3 console "Management" tab and then click on "Metrics"... Screen shot of the tab
Upvotes: 6
Reputation: 181
You can just execute this cli command to get the total file count in the bucket or a specific folder
Scan whole bucket
aws s3api list-objects-v2 --bucket testbucket | grep "Key" | wc -l
aws s3api list-objects-v2 --bucket BUCKET_NAME | grep "Key" | wc -l
you can use this command to get in details
aws s3api list-objects-v2 --bucket BUCKET_NAME
Scan a specific folder
aws s3api list-objects-v2 --bucket testbucket --prefix testfolder --start-after testfolder/ | grep "Key" | wc -l
aws s3api list-objects-v2 --bucket BUCKET_NAME --prefix FOLDER_NAME --start-after FOLDER_NAME/ | grep "Key" | wc -l
Upvotes: 4
Reputation: 14124
Can also be done with gsutil du
(Yes, a Google Cloud tool)
gsutil du s3://mybucket/ | wc -l
Upvotes: 0
Reputation: 2729
If you are using AWS CLI on Windows, you can use the Measure-Object
from PowerShell to get the total counts of files, just like wc -l
on *nix.
PS C:\> aws s3 ls s3://mybucket/ --recursive | Measure-Object
Count : 25
Average :
Sum :
Maximum :
Minimum :
Property :
Hope it helps.
Upvotes: 7
Reputation: 707
You can potentially use Amazon S3 inventory that will give you list of objects in a csv file
Upvotes: 0
Reputation: 2017
Although this is an old question, and feedback was provided in 2015, right now it's much simpler, as S3 Web Console has enabled a "Get Size" option:
Which provides the following:
Upvotes: 94
Reputation: 91
From the command line in AWS CLI, use ls plus --summarize
. It will give you the list of all of your items and the total number of documents in a particular bucket. I have not tried this with buckets containing sub-buckets:
aws s3 ls "s3://MyBucket" --summarize
It make take a bit long (it took listing my 16+K documents about 4 minutes), but it's faster than counting 1K at a time.
Upvotes: 8
Reputation: 13106
aws s3 ls s3://mybucket/ --recursive | wc -l
or
aws cloudwatch get-metric-statistics \
--namespace AWS/S3 --metric-name NumberOfObjects \
--dimensions Name=BucketName,Value=BUCKETNAME \
Name=StorageType,Value=AllStorageTypes \
--start-time 2016-11-05T00:00 --end-time 2016-11-05T00:10 \
--period 60 --statistic Average
Note: The above cloudwatch command seems to work for some while not for others. Discussed here: https://forums.aws.amazon.com/thread.jspa?threadID=217050
You can look at cloudwatch's metric section to get approx number of objects stored.
I have approx 50 Million products and it took more than an hour to count using aws s3 ls
Upvotes: 402
Reputation: 59
The easiest way is to use the developer console, for example, if you are on chrome, choose Developer Tools, and you can see following, you can either find and count or do some match, like 280-279 + 1 = 2
...
Upvotes: -1
Reputation: 439
You can download and install s3 browser from http://s3browser.com/. When you select a bucket in the center right corner you can see the number of files in the bucket. But, the size it shows is incorrect in the current version.
Gubs
Upvotes: 0
Reputation: 4409
In s3cmd, simply run the following command (on a Ubuntu system):
s3cmd ls -r s3://mybucket | wc -l
Upvotes: 4
Reputation: 5860
You can use AWS cloudwatch metrics for s3 to see exact count for each bucket.
Upvotes: 43
Reputation: 1076
There is an easy solution with the S3 API now (available in the AWS cli):
aws s3api list-objects --bucket BUCKETNAME --output json --query "[length(Contents[])]"
or for a specific folder:
aws s3api list-objects --bucket BUCKETNAME --prefix "folder/subfolder/" --output json --query "[length(Contents[])]"
Upvotes: 79
Reputation: 3620
Go to AWS Billing, then reports, then AWS Usage reports. Select Amazon Simple Storage Service, then Operation StandardStorage. Then you can download a CSV file that includes a UsageType of StorageObjectCount that lists the item count for each bucket.
Upvotes: 6
Reputation: 547
If you use the s3cmd command-line tool, you can get a recursive listing of a particular bucket, outputting it to a text file.
s3cmd ls -r s3://logs.mybucket/subfolder/ > listing.txt
Then in linux you can run a wc -l on the file to count the lines (1 line per object).
wc -l listing.txt
Upvotes: 53
Reputation: 2803
I used the python script from scalablelogic.com (adding in the count logging). Worked great.
#!/usr/local/bin/python
import sys
from boto.s3.connection import S3Connection
s3bucket = S3Connection().get_bucket(sys.argv[1])
size = 0
totalCount = 0
for key in s3bucket.list():
totalCount += 1
size += key.size
print 'total size:'
print "%.3f GB" % (size*1.0/1024/1024/1024)
print 'total count:'
print totalCount
Upvotes: 2
Reputation: 7200
There is no way, unless you
list them all in batches of 1000 (which can be slow and suck bandwidth - amazon seems to never compress the XML responses), or
log into your account on S3, and go Account - Usage. It seems the billing dept knows exactly how many objects you have stored!
Simply downloading the list of all your objects will actually take some time and cost some money if you have 50 million objects stored.
Also see this thread about StorageObjectCount - which is in the usage data.
An S3 API to get at least the basics, even if it was hours old, would be great.
Upvotes: 48