pmann
pmann

Reputation: 759

Etcd cluster setup failure

I am trying to setup 3 node etcd cluster on Ubuntu machines as docker data store for networking. I successfully created etcd cluster using etcd docker image. Now when I am trying to replicate it, the steps fail on one node. Even after removing the failing node from the step up, the cluster is still looking for the removed node. The same error is being faced when I am using etcd binary.

Used following command by changing ip accordingly on all nodes:

docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
 --name etcd quay.io/coreos/etcd \
 -name etcd0 \
 -advertise-client-urls http://172.27.59.141:2379,http://172.27.59.141:4001 \
 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
 -initial-advertise-peer-urls http://172.27.59.141:2380 \
 -listen-peer-urls http://0.0.0.0:2380 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://172.27.59.141:2380,etcd1=http://172.27.59.244:2380,etcd2=http://172.27.59.232:2380 \
 -initial-cluster-state new

Two of the nodes connect properly but the service of third node stops. Following is the log of the third node.

2016-06-16 17:16:34.293248 I | etcdmain: etcd Version: 2.3.6
2016-06-16 17:16:34.294368 I | etcdmain: Git SHA: 128344c
2016-06-16 17:16:34.294584 I | etcdmain: Go Version: go1.6.2
2016-06-16 17:16:34.294781 I | etcdmain: Go OS/Arch: linux/amd64
2016-06-16 17:16:34.294962 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-06-16 17:16:34.295142 W | etcdmain: no data-dir provided, using default data-dir ./node2.etcd
2016-06-16 17:16:34.295438 I | etcdmain: listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.295654 I | etcdmain: listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.295846 I | etcdmain: listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.296193 I | etcdmain: stopping listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.301139 I | etcdmain: stopping listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.301454 I | etcdmain: stopping listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.301718 I | etcdmain: --initial-cluster must include node2=http://172.27.59.232:2380 given --initial-advertise-peer-urls=http://172.27.59.232:2380

Even after removing the failing node I can see that the two nodes are waiting for the third node to connect.

2016-06-16 17:16:12.063893 N | etcdserver: added member 17879927ec74147b [http://172.27.59.232:238] to cluster ba4424e006edb53e
2016-06-16 17:16:12.064431 N | etcdserver: added local member 24d9feabb7e2f26f [http://172.27.59.244:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.065229 N | etcdserver: added member 2bda70be57138cfe [http://172.27.59.141:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.218560 I | raft: 24d9feabb7e2f26f [term: 1] received a MsgVote message with higher term from 2bda70be57138cfe [term: 29]
2016-06-16 17:16:12.218964 I | raft: 24d9feabb7e2f26f became follower at term 29
2016-06-16 17:16:12.219276 I | raft: 24d9feabb7e2f26f [logterm: 1, index: 3, vote: 0] voted for 2bda70be57138cfe [logterm: 1, index: 3] at term 29
2016-06-16 17:16:12.222667 I | raft: raft.node: 24d9feabb7e2f26f elected leader 2bda70be57138cfe at term 29
2016-06-16 17:16:12.335904 I | etcdserver: published {Name:node1 ClientURLs:[http://172.27.59.244:2379 http://172.27.59.244:4001]} to cluster ba4424e006edb53e
2016-06-16 17:16:12.336459 N | etcdserver: set the initial cluster version to 2.2
2016-06-16 17:16:42.059177 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:12.060313 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:42.060986 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy

It can be seen that despite starting the cluster with two nodes it is still searching for the third node.

Is there a location on local disk where data is being saved and its picking up old data despite it being not provided.

Please suggest what I am missing.

Upvotes: 4

Views: 4682

Answers (1)

sel-fish
sel-fish

Reputation: 4486

Is there a location on local disk where data is being saved and its picking up old data despite it being not provided.

Yes, the data of membership already stored at node0.etcd and node1.etcd.

You can get the following message from the log which indicates that the server already belongs to a cluster:

etcdmain: the server is already initialized as member before, starting as etcd member...

In order to run a new cluster with two members, just add another argument to your command :

--data-dir bak

Upvotes: 4

Related Questions