Reputation: 135
I am configuring a chef server and expect to manage over 500 nodes through this server - maybe close to 1000. Is this something I can expect to work effectively on say an extra-large instance on EC2? Should I consider running rabbitmq, solr etc on separate servers? Is it possible to run chef server itself in a distributed setup?
Upvotes: 3
Views: 3781
Reputation: 8258
Update
Chef 11 was released earlier this year. Along with the release were a couple of press releases / case studies for companies that Opscode worked with for scalability testing. Of note, Facebook and Cycle Computing managed 10,000+ node clusters with a single Chef server. The specifications of the Chef Server are modest, but undisclosed. Further information is available here:
Important to note that this applies to both Open Source Chef Server and Enterprise Chef. Opscode's Hosted Enterprise Chef service is essentially an enormous Enterprise Chef instance, since it runs basically "the same" code base.
(not precisely the same, as Opscode has customizations and additional services that are required in running a publicly accessible SaaS platform that allows multiple customers to pay and use.)
This page on the Chef Wiki has a lot of good links and information:
Some points to consider:
The metric that matters isn't the number of nodes, it's the number of node converges over time you expect. For example, 500 nodes that run Chef once a day are less load on the server than 50 nodes that run Chef every 10 minutes. Of course, 500 nodes running Chef every 10 minutes (or even 30 minutes, a common interval time) is a lot of load on the system.
Chef Server was designed as a distributed system so the components could run on separate nodes. That is exactly how Opscode Hosted Chef and Opscode Private Chef work - various services run on separate systems to distribute the load. If you're expecting a lot of nodes running Chef often, you should absolutely run the services on separate systems. The Chef Configuration Settings page on the wiki describes the configuration options for the services.
Highly available and scalable are not the same thing, and require different approaches. The differences between those is outside of the scope of Chef entirely. The "Scalability and High Availability" page should help, though.
The Chef 10.x or 0.10.x release uses a Ruby-based API service, and CouchDB as the back end data store. At the scale of Hosted Chef, Opscode found issues with scalability, which is described in Seth Falcon's talk at ChefConf 2012. While that talk is mainly about an awesome live migration of customer data, there's several points about the scalability wrt CouchDB. Also, Chef 11 will be migrated to an Erlang-based API service, and SQL (MySQL or PostgreSQL) as the back end data store.
Update The Chef 11 release of the Open Source Chef Server entails a complete rewrite of the server API service in Erlang as described previously. The information at the top of this answer provides more insight into what that all means, with case studies and talks.
Upvotes: 10
Reputation: 2457
@jtimberman sums it up pretty well and you can indeed scale things up to a certain point by splitting up the Chef service onto a number of separate nodes and throwing more resources at it.
By way of a data point I have seen ~700 clients managed by a single (open source) Chef 10.x server with solr and couchdb on separate nodes.
Upvotes: 1