Reputation: 2361
We have s3 'folders' (objects with a prefix under a bucket) with millions and millions of files and we want to figure out the size of these folders.
Writing my own .net application to get the lists of s3 objects was easy enough but the maximum number of keys per request is 1000, so it's taking forever.
Using S3Browser to look at a 'folder's' properties is taking a long time too. I'm guessing for the same reasons.
I've had this .NET application running for a week - I need a better solution.
Is there a faster way to do this?
Upvotes: 23
Views: 27094
Reputation: 5107
Today the best solution is setting up https://aws.amazon.com/s3/storage-lens/
The question starts with "quickly" and all of the other answers here - both the aws cli ls
command approach, and the "get total size" command in the online console - these actually have to loop through every single object and add up its size in real time to get you the answer. That means a solution that's "quick" for one bucket can be unusably slow for something like a 200GB backups bucket made up of a huge number of small files.
Upvotes: -1
Reputation: 1081
I prefer using the AWSCLI. I find that the web console often times out when there are too many objects.
start=s3://bucket/ && \
for prefix in `aws s3 ls $start | awk '{print $2}'`; do
echo ">>> $prefix <<<"
aws s3 ls $start$prefix --recursive --summarize | tail -n2
done
or in one line form:
start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
Output looks something like:
$ start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
>>> extracts/ <<<
Total Objects: 23
Total Size: 10633858646
>>> hackathon/ <<<
Total Objects: 2
Total Size: 10004
>>> home/ <<<
Total Objects: 102
Total Size: 1421736087
Upvotes: 12
Reputation: 1096
Seems like AWS added a menu item where it's possible to see the size:
Upvotes: 21
Reputation: 540
The AWS CLI's ls
command can do this: aws s3 ls --summarize --human-readable --recursive s3://$BUCKETNAME/$PREFIX --region $REGION
Upvotes: 34
Reputation: 3048
I think the ideal solution does not exist. But I offer some ideas you can further develop:
Upvotes: 5
Reputation: 969
If they're throttling you too 1000 keys per request, I'm not certain how PowerShell is going to help, but if you want to size of a bunch of folders, something like this should do it.
Save the following in a file called Get-FolderSize.ps1:
param
(
[Parameter(Position=0, ValueFromPipeline=$True, Mandatory=$True)]
[ValidateNotNullOrEmpty()]
[System.String]
$Path
)
function Get-FolderSize ($_ = (get-item .)) {
Process {
$ErrorActionPreference = "SilentlyContinue"
#? { $_.FullName -notmatch "\\email\\?" } <-- Exlcude folders.
$length = (Get-ChildItem $_.fullname -recurse | Measure-Object -property length -sum).sum
$obj = New-Object PSObject
$obj | Add-Member NoteProperty Folder ($_.FullName)
$obj | Add-Member NoteProperty Length ($length)
Write-Output $obj
}
}
Function Class-Size($size)
{
IF($size -ge 1GB)
{
"{0:n2}" -f ($size / 1GB) + " GB"
}
ELSEIF($size -ge 1MB)
{
"{0:n2}" -f ($size / 1MB) + " MB"
}
ELSE
{
"{0:n2}" -f ($size / 1KB) + " KB"
}
}
Get-ChildItem $Path | Get-FolderSize | Sort-Object -Property Length -Descending | Select-Object -Property Folder, Length | Format-Table -Property Folder, @{ Label="Size of Folder" ; Expression = {Class-Size($_.Length)} }
Usage: .\Get-FolderSize.ps1 -Path \path\to\your\folders
Upvotes: 1