b15
b15

Reputation: 2361

Quickly finding the size of an S3 'folder'

We have s3 'folders' (objects with a prefix under a bucket) with millions and millions of files and we want to figure out the size of these folders.

Writing my own .net application to get the lists of s3 objects was easy enough but the maximum number of keys per request is 1000, so it's taking forever.

Using S3Browser to look at a 'folder's' properties is taking a long time too. I'm guessing for the same reasons.

I've had this .NET application running for a week - I need a better solution.

Is there a faster way to do this?

Upvotes: 23

Views: 27094

Answers (6)

squarecandy
squarecandy

Reputation: 5107

Today the best solution is setting up https://aws.amazon.com/s3/storage-lens/

The question starts with "quickly" and all of the other answers here - both the aws cli ls command approach, and the "get total size" command in the online console - these actually have to loop through every single object and add up its size in real time to get you the answer. That means a solution that's "quick" for one bucket can be unusably slow for something like a 200GB backups bucket made up of a huge number of small files.

Upvotes: -1

debugme
debugme

Reputation: 1081

I prefer using the AWSCLI. I find that the web console often times out when there are too many objects.

  • replace s3://bucket/ with where you want to start from.
  • relies on awscli, awk, tail, and some bash-like shell
start=s3://bucket/ && \
for prefix in `aws s3 ls $start | awk '{print $2}'`; do
  echo ">>> $prefix <<<"
  aws s3 ls $start$prefix --recursive --summarize | tail -n2
done

or in one line form:

start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done

Output looks something like:

$ start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
>>> extracts/ <<<
Total Objects: 23
   Total Size: 10633858646
>>> hackathon/ <<<
Total Objects: 2
   Total Size: 10004
>>> home/ <<<
Total Objects: 102
   Total Size: 1421736087

Upvotes: 12

Filippo Loddo
Filippo Loddo

Reputation: 1096

Seems like AWS added a menu item where it's possible to see the size:

size of S3 folder

Upvotes: 21

Foolish Brilliance
Foolish Brilliance

Reputation: 540

The AWS CLI's ls command can do this: aws s3 ls --summarize --human-readable --recursive s3://$BUCKETNAME/$PREFIX --region $REGION

Upvotes: 34

MatteoSp
MatteoSp

Reputation: 3048

I think the ideal solution does not exist. But I offer some ideas you can further develop:

  1. Is the app the only mean by which file are written to S3? If so, you can store (in a db, a file or what ever) the files size and sum it when necessary
  2. Do concurrent calls to the LIST api
  3. Can you switch from an organisation based on folders to one based on buckets? If so, you could query the billing API (yes, the billing) and calculating the size (or an approximation of) from cost...

Upvotes: 5

Jeffrey Eldredge
Jeffrey Eldredge

Reputation: 969

If they're throttling you too 1000 keys per request, I'm not certain how PowerShell is going to help, but if you want to size of a bunch of folders, something like this should do it.

Save the following in a file called Get-FolderSize.ps1:

param
(
    [Parameter(Position=0, ValueFromPipeline=$True, Mandatory=$True)]
    [ValidateNotNullOrEmpty()]
    [System.String]
    $Path
)

function Get-FolderSize ($_ = (get-item .))  {
  Process {
    $ErrorActionPreference = "SilentlyContinue"
    #? { $_.FullName -notmatch "\\email\\?" }  <-- Exlcude folders.
    $length = (Get-ChildItem $_.fullname -recurse | Measure-Object -property length -sum).sum
    $obj = New-Object PSObject
    $obj | Add-Member NoteProperty Folder ($_.FullName)
    $obj | Add-Member NoteProperty Length ($length)
     Write-Output $obj
  }
}

Function Class-Size($size)
{

    IF($size -ge 1GB)
    {
        "{0:n2}" -f  ($size / 1GB) + " GB"
    }
    ELSEIF($size -ge 1MB)
    {
        "{0:n2}" -f  ($size / 1MB) + " MB"
    }
    ELSE
    {
        "{0:n2}" -f  ($size / 1KB) + " KB"
    }
}

Get-ChildItem $Path | Get-FolderSize | Sort-Object -Property Length -Descending | Select-Object -Property Folder, Length | Format-Table -Property Folder, @{ Label="Size of Folder" ; Expression = {Class-Size($_.Length)} }

Usage: .\Get-FolderSize.ps1 -Path \path\to\your\folders

Upvotes: 1

Related Questions