Chef Configuration Constructs by Example

Question

I am trying to wrap my head around Chef and its many configuration constructs:

enter image description here

So we have:

Nodes
Run Lists
Roles
Recipes
Attributes
Environments
Cookbooks
Templates
and even 1 more thing not on that diagram: Data Bags

This is a bit overwhelming. After reading pretty deep into the Chef docs, I have the following understanding of everything:

A Node (devmyapp01) is a machine that Chef will manage configuration for. That Node belongs to an Environment (myapp-dev) and it has a Run List which is a set of Roles (mysql-database). Each Role has a Recipe which itself can have 0+ parameterizable Attributes which may be different across the different Environments. For instance, the mysql-database role may have a Recipe which contains a MAX_TABLE_SIZE Attribute, which is the max size a particular table can grow to. Perhaps in DEV it is set to 256 MB but in PROD it is 16 GB, etc. However, this is different than a Data Bag which, like an Attribute, belongs to a Recipe, but instead of being a key-value pair, is basically a JSON ball. A Cookbook is a collection of Recipes that somehow transcends Roles. A Template is a templated Cookbook that allows some kind of additional layer of parameterization/customization.

Now I'm sure my understanding is either flat out wrong or is at least somewhat mislead. Can some battle-weary Chef veteran take each of these concepts above and give a specific, concrete example for each of them in actual use? If you'd like to stick with my MySQL database example, what might be different: Nodes, Run Lists, Roles, Recipes, Attributes, Environments, Cookbooks, Templates and Data Bags look like for a Chef configuration managing a MySQL DB? If I could see an actual, practical, concrete example of all these constructs I might actually be able to wrap my head around Chef :-).

Tensibai · Accepted Answer

What is a cookbook ?

You have a lot of parts in a cookbook, corresponding to directories:

Attributes : This directory will contain ruby files to define default attributes used in the recipes later.
Recipes : Contain ruby files defining resources (directory, files, services) and the desired state for them.
files : Here you'll store static files you wish to deploy on hosts (a login banner for example)
resources and providers, library They are used to defined custom resources or helpers to use in recipes
templates : Here you'll define files templates, they are a way to define files on host which are host dependent, you can give variables to the templates and use the node attributes inside (set the number of thread in mysql to be 3 times the number of cpu for exemple could be computed like node['cpu']['real']*3

At the root of the cookbook you'll find a mandatory file named metadata.rb, which define the cookbook name, its version and its dependencies.

What are dependencies ?

Let's take the database and mysql cookbooks.

In the mysql cookbook there's specific resources defined like mysql_user, mysql_database and so on. The database cookbook use this resources, it depends on the code located in the mysql cookbook.

That's why you'll find a line in its metadata.rb like this depends 'mysql', '>= 5.0.0'. This line tells chef to load the mtsql cookbook in version 5.0.0 or higher if available on the chef-server when the database cookbook is loaded.

What is a node ?

Short one: a computer on which chef-client is run. Long one: A target system where chef-client is run to get the system in the desired state. The node is also the object in chef where we store the run_list and where is stored attributes about the node (ram,cpu, jdk version, mysql version, etc.)

The attributes on the nodes are made from different sources, automatic attributes gathered by ohai ( ram, number of cpu, disks, os type, etc) are merged with attributes from loaded cookbooks, attributes from roles and attributes from environment the node belongs to. See here for the details on this part;

What is the runlist ?

The runlist is an attribute of the node which list the recipes and roles we want to apply to this node.

What is a Role ?

A role is a kind of helper, it gives you a way to define attributes and a runlist to apply to one or many nodes. the consensus is to avoid setting attributes in roles as they're not version-controlled on the server side and are generally cross environment.

A easy to understand drawback is a password change on mysql when you have an already on line setup. You have 3 server in dev/QA and PROD. If you make a change on the role, it will be applied on all environments when you probably wanted to restrict to dev then QA and then PROD once the tests are OK.

The workaround is to use a cookbook to do the same, use depends in the metadata.rb and include_recipe in this wrapper cookbook to define the runlist, and use this cookbooks attributes files to set the common attributes as in a role.

What is an environment ?

An environment is a logical group for your nodes, you can follow the dev/QA/PROD as your working envs or you may go by kind of system (web-servers/db-servers) and eventually mix both( Dev_web-servers, Dev_db-server, and so on). A node can belong to only one environment.

An environment can host attributes too, usually a dns server, a smtp server specific to this environment, etc. Same warning as roles, they are not version controlled, but the scope is less wide here as it targets a logical group of node.

The main interest of environment is the cookbook version limitation. You can control which version of which cookbook is available on each environment. This come handy when you're working a new version of a cookbook but don't want it to be applied on all your servers. If you change some myslq parameter in a my-mysql cookbook attributes file, you'll wish to restrict when the changes are made on all environment, the limitation will help you there, have QA and PROD limiting your my-mysql cookbook in version A when your dev environment is allowed to use version B.

What is a DataBag ?

As it names imply, a databag is a store of data. It's a group of Json files, each being a DataBagItem and containing json converted to a mash to be used in recipes when loaded.

DataBags goal is to store read-only data, updating a databag from a recipe is dangerous, each item being saved as a whole, two nodes trying to write the same object at the same time will go into a race condition and one change will be lost.

The main purpose of databags is to store common objects (a list of admin users, etc) you don't wish to set in a cookbook/role.

So all glued together

I tried to give a simple view of this documentation on the chef-run here on this paragraph.

We have a node running chef-client, it will ask the chef-server (or read the command line) to know it's runlist. This runlist will then be expanded (search which recipes to load in the roles). In this early stage, ohai will be executed to gather the automatic attributes.

After that load all the cookbooks from the server (or disk in solo/local mode) taking care of limitations and dependencies and read all the attributes files.

Now compile the recipes, in this phase the recipe's ruby code is executed, and a collection of resources is built. At this time nothing has changed on the machine.

Once all recipes have been compiled, the resource collection is ready, go over it and try to get each resource on the desired state. I;e: convergence phase.

The directories will be created if they don't exist. Templates will be rendered and then compared to their target, if they don't match, the target will be replaced. Services are checked for their state, if the desired state is :start and the service is stopped, chef will try to start it.

Once the convergence is done, the node state will be saved on the chef-server.