YD8877
YD8877

Reputation: 10790

How to implement a distributed file upload solution?

I have a file uploading site which is currently resting on a single server i.e using the same server for users to upload the files to and the same server for content delivery.

What I want to implement is a CDN (content delivery network). I would like to buy a server farm and somehow if i were to have a mechanism to have files spread out across the different servers, that would balance my load a whole lot better.

However, I have a few questions regarding this:

Assuming my server farm consists of 10 servers for content delivery,

  1. Since at the user end, the script to upload files will be one location only, i.e <form action=upload.php>, It has to reside on a single server, correct? How can I duplicate the script across multiple servers and direct the user's file upload data to the server with the least load?

  2. How should I determine which files to be sent to which server? During the upload process, should I randomize all files to go to random servers? If the user sends 10 files should i send them to a random server? Is there a mechanism to send them to the server with the least load? Is there any other algorithm which can help determine which server the files need to be sent to?

  3. How will the files be sent from the upload server to the CDN? Using FTP? Wouldn't that introduce additional overhead and need for error checking capability to check for FTP connection break, and to check if file was transferred successfully etc.?

Upvotes: 2

Views: 1257

Answers (1)

Brigand
Brigand

Reputation: 86230

Assuming you're using an Apache server, there is a module called mod_proxy_balancer. It handles all of the load-balancing work behind the scenes. The user will never know the difference -- except when their downloads and uploads are 10 times faster.

  1. If you use this, you can have a complete copy on each server.

  2. mod_proxy_balancer will handle this for you.

  3. Each server can have its own sub-domain. You will have a database on your 'main' server, which matches up all of your download pages to the physical servers they are located on. Then a on-the-fly URL is passed based on some hash encryption algorithm, which prevents using a hard link to the download and increases your page hits. It could be a mix of personal and miscellaneous information, e.g., the users IP and the time of day. The download server then checks the hashes, and either accepts or denies the request.

If everything checks out, the download starts; your load is balanced; and the users don't have to worry about any of this behind the scenes stuff.

note: I have done Apache administration and web development. I have never managed a large CDN, so this is based on what I have seen in other sites and other knowledge. Anyone who has something to add here, or corrections to make, please do.

Update

There are also companies that manage it for you. A simple Google search will get you a list.

Upvotes: 3

Related Questions