Reputation: 91
What's a good rule of thumb for determining whether to scale up or down the number of cloud based web servers I have running? Are there any other metrics besides request execution time, processor utilization, available memory, and requests per second that should be monitored for this purpose? Should the weighted average, standard deviation or some other calculation be used for determining scale up or down? And finally, are there any particular values that are best for determining when to add or reduce server instances?
Upvotes: 1
Views: 1162
Reputation: 3967
Your question is a hot research area right now. However, the web-server utilization can be automated by the cloud providers in different ways. For the details, how it works? Which metrics effects the scale up and down: you can glance at this paper.
Amazon has announced Elastic Beanstalk, which lets you deploy an application to Amazon’s EC2 (Elastic Compute Cloud) and have it scale up or down, by launching or terminating server instances, according to demand. There is no additional cost for using Elastic Beanstalk; you are charged for the instances you use.
Also, you can check Auto Scaling which Amazon AWS offers.
Auto Scaling allows you to scale your Amazon EC2 capacity automatically up or down according to conditions you define. With Auto Scaling, you can ensure that the number of Amazon EC2 instances you’re using increases seamlessly during demand spikes to maintain performance and decreases automatically during demand lulls to minimize costs. Auto Scaling is particularly well suited for applications that experience hourly, daily, or weekly variability in usage. Auto Scaling is enabled by Amazon CloudWatch and available at no additional charge beyond Amazon CloudWatch fees.
I recommend you to read the details from Amazon AWS to dig how their system utilize scale up and down for web servers.
Upvotes: 1
Reputation: 41718
This question of dynamically allocating compute instances brings back memories of my control systems classes in engineering school. It seems we should be able to apply classical digital control systems algorithms (think PID loops and Z-transforms) to scaling servers. Spinning up a server instance is analogous to moving an engine throttle's stepper motor one notch to increase the fuel and oxygen rate in response to increased load. Respond too slowly and the performance is sluggish (overdamped), too quickly and the system becomes unstable (underdamped).
In both the compute and physical domains, the goal is to have resources match load. The good news is that compute systems are among the easier control systems to deal with, since having too many resources doesn't cause instability; it just costs money, similar to generating electricity in a system with a resistor bank to burn off excess.
It’s great to see how the fundamentals keep coming around again! We learned all that for a reason.
Upvotes: 3