Matthew Nichols
Matthew Nichols

Reputation: 5035

Storing lots of files in Azure Storage

I am building an app that will need to store lots (250,000ish) of smallish (2MB - 10MB) files. I want to use Azure Storage for this as the rest of related systems are on Azure. Each file will have a unique name (GUID probably). What I am a little uncertain about is how BLOBs correspond to files. Since I know that each file will be uniquely named can/should I just store one file per blob in a single container?

Still getting my head around the Azure Storage concepts so apologies if annoyingly naive.

Upvotes: 3

Views: 2247

Answers (2)

Igorek
Igorek

Reputation: 15860

You can absolutely store all files within a single container. There is no limit on number of blobs in a container, except that the storage account is limited to 100TB 500TB.

Each blob gets its own storage partition, so that means that your files will be stored in a mega-scalable way across potentially many servers.

The only drawback of storing everything in one single container is that listing filenames in one directory is hard and slow. Also, purging by things like date is hard. If you ever need to purge your blobs, consider thinking about storage strategy where you can purge containers at a time.

HTH

Upvotes: 4

Louis S. Berman
Louis S. Berman

Reputation: 539

I created a similar blob storage repository (2.6MM files / 3.9TB), so I think my experiences might be a good proxy for yours. I should say, however, that at least half of those files were 1K or less, so my findings won't be a total match.

Regardless, I had the same question as you: does containers / folder organization) affect retrieval speed? My tests, however, showed that retrieval speed was virtually identical no matter how I organized the files.

For your case, I'd simply use a single container with no underlying "folder" structure, especially since you're going to use GUIDs as filenames. You'll never want to list all 250K files (unless you want to do a complete "directory" scan) so a flat organization scheme seems best.

The only reason to pick an alternate scheme (with multiple containers and/or folders) would be if you wanted to periodically roll of a subset of the files (i.e., after a certain number of days).

Upvotes: 5

Related Questions