Adam
Adam

Reputation: 4780

Amazon S3 file and path naming architecture decisions

I am interested in learning why so many services like Twitter and Facebook name their CDN files the way they do. Looking at http://25.media.tumblr.com/tumblr_m6m6g57NgY1qdhfhho2_1280.jpg I have some observational questions:

  1. Do they use multiple sub domains (25.media, 26.media, etc.) to offload DNS queries from a single domain? It would seem like storage.tumblr.com would be good enough for all their images since S3 just has the concept of one big bucket.
  2. Are they inserting a hashed string into the file name to prevent a sequential walk from a web harvesting tool? That seems like a good idea. Take the file name and append some junk to it, hash it, and insert that hash to the tumblr_XXXXXXXXXXXXXXXXXX_1280.jpg file name.

Upvotes: 3

Views: 607

Answers (2)

Bryan
Bryan

Reputation: 1

  1. Another possible reason for the multiple subdomains is the fact they may be using multiple media containers due to restrictions on the number of objects each container may hold (or should hold, to keep things running quickly. Too many objects in a single container can slow things down).

Upvotes: 0

Geoff Appleford
Geoff Appleford

Reputation: 18832

  1. Browsers have limits to how many parallel requests they can make to a single domain, using multiple sub domains means more parallel requests. See: http://yuiblog.com/blog/2007/04/11/performance-research-part-4/

  2. They might be using the seemingly random filenames for the reason you describe. But more likely they are using that to ensure file name uniqueness and too invalidate cache's if the file changes thereby ensuring that all users are seeing the latest version.

Upvotes: 4

Related Questions