JahMyst
JahMyst

Reputation: 1686

Chef: Cookbook can't depend on older version of other cookbook

Background

Note: Keep in mind for this whole problem that I want to use version 0.0.3 of the environment-cookbook.

Note 2: We NEVER had this problem before. This is recent and we don't know what caused it.

1. environment-cookbook and environment-cookbook-machines

To build domains, we have two cookbooks:

When I check the Chef-server:

$ knife cookbook list -a | grep environment
environment-cookbook 0.0.3 0.0.4 0.0.5 0.0.6 0.0.7

Looking at company-environment-cookbook 0.0.3's metadata.json, I see it depends on dir-library 0.13.6:

"dependencies": {
  "dir-library": "= 0.13.6"
}

Whereas environment-cookbook 0.0.4 and higher depend on dir-library 0.13.7:

"dependencies": {
  "dir-library": "=0.13.7"
}

2. wrapper-domain and wrapper-domain-machines

As each domain can depend on a specific version of environment-cookbook, we use wrapper cookbooks for each domain, respectively

Checking metadata.json for both of the wrappers shows a dependency on environment-cookbook 0.0.3 (this is the version I want):

"dependencies": {
  "environment-cookbook": "= 0.0.3"
  ...
}

wrapper-domain-machines also shows a dependency on environment-cookbook-machines 0.0.3.

The Berksfile.lock for those wrapper cookbooks look like this:

GRAPH 
environment-cookbook (0.0.3)
    dir-library (= 0.13.6)
environment-cookbook-machines (0.0.3)  # Only here for wrapper-cookbook-machines
    dir-library (= 0.13.6)

3. Dependency graph

When I run berks wiz on my wrapper-domain-machines cookbook, I get the following dependency graph: enter image description here

SO EVERYTHING SEEMS FINE.

Problem

When I run the domain build through a CI Job on Hudson, I see the following at the beginning of the log file:

INFO: Using dir-library (0.13.6)
INFO: Using environment-cookbook (0.0.3)
INFO: Using environment-cookbook-machines (0.0.3)
INFO: Installing environment-cookbook (0.0.3) from chef-server-url
INFO: Using dir-library (0.13.6)

A little bit further:

INFO: Run List is [recipe[wrapper-domain-machines::up-machines]]
INFO: Run List expands to [wrapper-domain-machines::up-machines]
INFO: Loading cookbooks [[email protected], [email protected], [email protected]]

So far so good. It's using version 0.0.3 for environment-cookbook and 0.13.6 for dir-library.

Later on during the build:

INFO: Run List is [recipe[environment-cookbook::prepare_machine]]
INFO: Run List expands to [environment-cookbook::prepare_machine]
INFO: Starting Chef Run for domain.company.com
INFO: Running start handlers
INFO: Start handlers complete.
resolving cookbooks for run list: ["environment-cookbook::prepare_machine"][0m
INFO: Loading cookbooks [[email protected], [email protected]]

STOP, WHAT ?

INFO: Loading cookbooks [[email protected], [email protected]]

What we tried so far

  1. Delete cached cookbooks

    • Delete the cookbooks from .berkshelf/cookbooks
    • Re-run the build (downloads automatically from chef-server)
    • Still picks up 0.0.8
  2. Check for dependencies on environment-cookbook in other cookbooks: NONE.

  3. Delete and re-install all versions of environment-cookbook from 0.0.3 to 0.0.7: no luck.

  4. Clean chef clients and nodes before re-running: no luck either.

Questions

This is really a show-stopper for us.

Ask me any further clarifications and I'll update this post.

Upvotes: 1

Views: 1199

Answers (1)

Mark O'Connor
Mark O'Connor

Reputation: 77941

I suspect this is the same problem with chef we ran into. It boils down run-time revision control, as opposed to compile time revision control.

Lesson learned: Unconstrained cookbook versions at run-time are dangerous when running chef at scale.

Background

You're using Berkshelf to manage your cookbook dependencies, that's great and will ensure the correct versions get loaded into the chef server. The subtle problem is that each cookbook has its own dependency tree. At run-time, when you add multiple cookbooks to a node's run-list, chef server must calculate a fresh tree of dependencies. The problem can appear random because it depends on the combination of cookbooks you have on the run list. The more cookbooks, the more potential for conflict.

We tried to fix this problem by explicitly setting dependencies our cookbook metadata files. What we discovered was that Chef would silently fail to resolve the dependency tree for some of our cookbooks and default back to an older version for which it would calculate dependencies. Very puzzling.

We got closer to the problem when we began to explicitly set the version of the cookbook on the run-list. We began to get error messages that chef was unable to resolve dependencies. What was especially weird was that this problem impacted our production chef server the most. We eventually determined that it was because on production we had all historical versions of our cookbooks loaded. Purging old cookbooks helped, but did not solve our problems.

Chef had worked fine for nearly two years before we discovered these problems. It was time and scale that exposed a fatal flaw in our system. At run-time you need to fix the versions of your cookbooks to match the configurations you have previously tested.

Analysis

Coming from a Java background I equated the problem to how we we can run multiple applications on a tomcat server.

Maven is the build tool that manages each apps dependencies and creates a package for upload into tomcat. In Chef it is Berkshelf that fufils this function.

The big difference is at run-time. Tomcat creates a separate classpath for the jars belong to each application. This provides strong isolation between applications at run-time, safely allowing them to run different versions of the same cookbook. This was the impossible problem faced by Chef, at runtime chef-client only runs a single set of cookbooks.

Solutions

While I'm not a fan of policy files I present them as the option favoured by Chef.

Policy files

While most users are oblivious to the problem being solved, chef have developed a new feature called policy files:

In a nutshell what they're doing is setting a nodes run-list in advance, at compile time.

One big benefit of policy files is that they result in a faster chef run. The chef server no longer has to figure out the large dependency tree, this can be a big saving in chef installations with large numbers of cookbooks.

Environment cookbook pattern

Personally I'm not a fan of policy files, because I had already discovered the Environment cookbook pattern, a poorly understood but powerful feature of chef that already existed:

Now every time I deploy cookbooks I always use a Chef environment. The natural way to provide isolation in chef (Did I point out it was poorly understood). Here's an example using berkshelf:

berks upload 
berks apply my_app_cookbook_version1

The very handy "apply" command will use the Berkshelf lock file to update the cookbook versions in the environment "my_app_cookbook_version1". Now you've fixed the run-time to match your tested conditions.

The consequence of course is that I have an environment per application cookbook:

  • my_app_cookbook_version1
  • my_app_cookbook_version2
  • etc

This is actually a bonus for me, because it enables me to bootstrap infrastructure against something I've already tested:

knife bootstrap --environment my_app_cookbook_version1 ...

It creates predictability and means loading new cookbooks is not going to magically change my servers in production.

A bonus is that environments provide a record of the cookbook versions in use and a convenient place to set override attributes associated with deployment, like "app_owner", "app_version", etc.

Apologies for the long posting.

Upvotes: 3

Related Questions