Eugen Mayer
Eugen Mayer

Reputation: 9906

Consul-Agent architecture .. the node-id issue after upgrading to 0.8.1 - conceptual issue?

i am not sure where the root of my problem actually comes from, so i try to explain the bigger picture.

In short, the symptom: After upgrading consul from 0.7.3 to 0.8.1 my agents ( explaining that below ) could no longer connect to the cluster leader due to dublicated node-ids ( why that probably happens, explained below). I could neither fix it with https://www.consul.io/docs/agent/options.html#_disable_host_node_id nor fully understand, why i run into this .. and thats where the bigger picture and maybe even different questions comes from.

I have the following setup:

  1. I run a application stack with about 8 containers for different services ( different micoservices, DB-types and so on).

  2. I use a single consul server per stack (yes the consul server runs in the software stack, it has its reasons because i need this to be offline-deployable and every stack lives for itself)

  3. The consul-server does handle the registration, service discovery and also KV/configuration

  4. Important/Questionable: Every container has a consul agent started with with "consul agent -config-dir /etc/consul.d" .. connecting the this one server. The configuration looks like this .. including to other files with they encrypt token / acl token. Do not wonder about servicename() .. it replaced by a m4 macro during image build time

  5. The clients are secured by a gossip key and ACL keys

  6. Important: All containers are on the same hardware node

  7. Server configuration looks like this, if any important. In addition, ACLs looks like this, and a ACL-master and client token/gossip json files are in that configurtion folder


Sorry for this probably TLTR above, but the reasons behind all the explanation was, this multi-agent setup ( or 1-agent per container ).

My reasons for that:

  1. I use tiller to configure the containers, so a dimploy gem will try to usually connect to localhost:8500 .. to acomplish that without making the consul-configuration extraordinary complicated, i use this local agent, which then forwards the request to the actual server and thus handles all the encryption-key/ACL negation stuff

  2. i use several 'consul watch' tasks on the server to trigger re-configuration, they also run on localhost:8500 without any extra configuration

That said, the reason i run a 1-agent/container is, the simplicity for local services to talk to the consul-backend without really knowing about authentication as long as they connect through 127.0.0.1:8500 ( as the level of security )

Final Question:

Is that multi-consul agent actually designed to be used that way? The reason i ask is, because as far as i understand, the node-id duplication issue i get now when starting a 0.8.1 comes from "the host" being the same, so the hardware node being identical for all consul-agents .. right?

Is my design wrong or do i need to generate my own node-ids from now on and its all just fine?

Upvotes: 1

Views: 2409

Answers (1)

Eugen Mayer
Eugen Mayer

Reputation: 9906

Seem like this issue has been identified by Hashicorp and addressed in https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#085-june-27-2017 where -disable-host-node-id has been set to true by default, thus the node-id is no longer generated from the host hardware but a random uuid, which solves the issue i had running several consul nodes on the same physical hardware

So the way i deployed was fine.

Upvotes: 1

Related Questions