Reputation: 3059
I've been tasked to work on a project for a client that has a site which he is estimating will receive 1-2M hits per day. He has an existing database of 58M users that need to get seeded on a per-registration basis for the new brand. Most of the site's content is served up from external API supplied data with most of the data stored on our Mongo setup being profile information and saved API parameters.
NginX will be on port 80 and load balancing to a Node cluster on ports 8000 - 8010.
My question is what to do about caching. I come from a LAMP background so I'm used to either writing static HTML files with PHP and serving those up to minimize MySQL load or using Memcached for sites that required a higher level of caching. This setup is a bit foreign to me.
Which is the most ideal as far as minimal response time and CPU load?
Reference: http://andytson.com/blog/2010/04/page-level-caching-with-nginx/
server {
listen 80;
servername mysite.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
location / {
proxy_pass http://localhost:8080/;
proxy_cache anonymous;
}
# don't cache admin folder, send all requests through the proxy
location /admin {
proxy_pass http://localhost:8080/;
}
# handle static files directly. Set their expiry time to max, so they'll
# always use the browser cache after first request
location ~* (css|js|png|jpe?g|gif|ico)$ {
root /var/www/${host}/http;
expires max;
}
}
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
myRedisClient.set(key,theJSONresponse, function(err) {...});
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var fs = require('fs');
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
fs.writeFile('/var/www/_cache/' + key + '.json', theJSONresponse, function(err) {...});
I did some research and benchmarks like the ones shown on this site are leaning me away from this solution, but I'm still open to consider it if it makes the most sense: http://todsul.com/nginx-varnish
Upvotes: 20
Views: 8243
Reputation: 9914
nginx page-level caching is good for caching static content. But for dynamic content, it's no good. For example, how do you invalidate cache if the content is changed in the upstream?
Redis is perfect for in-memory data store. But I don't like to use it as cache. With limited amount of memory, I have to constantly worry about running out of memory. Yes you can set up strategy for expiring keys in redis. But that's extra work and still not as good as I want it to be a cache provider.
Have no experience on choices 3 and 4.
I'm surprised that you don't include memcache here as an option. From my experience, it's solid as a cache provider. One memcache feature that redis doesn't have is that it doesn't guarantee that a key won't be expired by the expiry time you specified. This is bad for a data store, but it makes memcache a perfect candidate for caching: you don't need to worry about using up memory you assigned to memcache. memcache will delete less used keys (the cache less used) even though the expiry time of those keys are not met yet.
Nginx provides this build-in memcache module. It's solid. A number of tutorials if you google online.
This one I like the most (see link below). Cache invalidation is easy: for example, if a page is updated in upstream, just delete the memcache key from the upstream app server. The author claimed 4x increase of the response time. Believe it's good enough for your use case.
http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/
Upvotes: 4
Reputation: 2964
As for varnish I do not intend to decipher the benchmarks on the site you found but I can tell you they are awfully bad numbers and has nothing in common with a real high-traffic implementation (google for varnish optimizations and see benchmarks showing 100-200k req/s instead of 8k).
Nginx is also an OK choice for a page cache and with 1-2M hits a day you do not need extreme performance. So go with whichever you feel more comfortable working with.
The two node solutions are really a worse choice. A page cache should be separate from your dynamic application to provide both reliability and performance.
Moreover, redis/memcached will best help you scale the application if you use it as an object cache or a cache for commonly used deserialized data.
Upvotes: 1
Reputation: 77956
I would do a combination, and use Redis to cache session user API calls that have a short TTL, and use Nginx to cache long term RESTless data and static assets. I wouldn't write JSON files as I imagine the file system IO would be the slowest and most CPU intensive of the options listed.
Upvotes: 18