Jimmy Chu
Jimmy Chu

Reputation: 982

How to reboot a CoreOS cluster properly?

I would like to reboot my CoreOS cluster nodes one by one, as I read many bad things of rebooting all nodes at once are not good (etcd, ceph could not keep a quorum, etc) What is the proper way of doing this, other than going into each machine manually and issue reboot command?

Is there a generic way to reboot n nodes in a cluster, wait for them to be up, and then another set of n nodes, until all nodes are rebooted?

Thank you.

Upvotes: 3

Views: 4735

Answers (2)

Robert Reiz
Robert Reiz

Reputation: 4433

Locksmith is the daemon for rebooting a CoreOS node. I recommend to pick the etcd-lock reboot strategy:

coreos:
  update:
    reboot-strategy: etcd-lock

By default this will reboot the cluster 1 by 1. I'm using fleetctl to remote control my CoreOS cluster. This script will send the reboot signal to all machines in the cluster:

#!/bin/bash -x

for machine in $(fleetctl list-machines --no-legend --full | awk '{ print $1;}'); do
        fleetctl ssh $machine "sudo locksmithctl reboot"
done

If your reboot-strategy is etcd-lock the nodes will not reboot immediately. They will reboot 1 by 1 until the whole cluster rebooted.

Upvotes: 2

emassa
emassa

Reputation: 61

In the cloud-config.yaml file you could add:

coreos:
  update:
    reboot-strategy: etcd-lock

which means that the machines in your cluster will acquire a lock before rebooting to ensure that no more then 1 machine is rebooted each time. Please refer to the documentation for additional informations: https://coreos.com/docs/cluster-management/setup/update-strategies/

Upvotes: 4

Related Questions