fonso
fonso

Reputation: 325

Resharding keys when a node goes down in redis cluster

Suppose I have a redis cluster with nodes 10.0.0.1, 10.0.0.2, 10.0.0.3 and 10.0.0.4, which I'm using as a cache.

Then, for whatever reason, node 10.0.0.4 fails and goes down. This brings down the entire cluster:

2713:M 13 Apr 21:07:52.415 * FAIL message received from [id1] about [id2]
2713:M 13 Apr 21:07:52.415 # Cluster state changed: fail

Which causes any query to be shut down with "CLUSTERDOWN The cluster is down".

However, since I'm using the cluster as a cache, I don't really care if a node goes down. A key can get resharded to a different node and lose its contents without affecting my application.

Is there a way to set up such an automated resharding?

Upvotes: 4

Views: 3084

Answers (2)

efdestegul
efdestegul

Reputation: 647

By assuming you have only master nodes in your current cluster, you will definitely get cluster down error because there is no replica of down master and Redis thinks cluster is not in safe and triggers an error.

Solution

  • Create a new node (Create redis.conf with desired parameters.)
  • Join that node to cluster

    redis-trib.rb add-node 127.0.0.1:6379 EXISTING_MASTER_IP:EXISTING_MASTER_PORT

  • Make node slave of 10.0.0.4

    redis-cli -p 6379 cluster replicate NODE_ID_OF_TARGET_MASTER

To Test

  • First be sure, cluster is in good shape.(All slots are covered and nodes are agreed about configurations.)

    redis-trib.rb check 127.0.0.1:6379 (On any master)

  • Kill process of 10.0.0.4

  • Wait Slave to be new master.(It happens quickly. Slots assigned to 10.0.0.4 will be resharded automatically to Slave.)
  • Check cluster and be sure all slots are moved new master

    redis-trib.rb check 127.0.0.1:6379 (On any master)

No manual actions needed. Additionally, if you have more slaves in cluster they may be promoted as new masters of other masters as well. (e.g. You have a setup of 3 master, 3 slaves. Master1 goes down, Slave1 becomes new master. Slave1 goes down, Slave1 can be new master as Master1.)

Upvotes: 0

fonso
fonso

Reputation: 325

I found something close enough to what I need.

By setting cluster-require-full-coverage to "no", the rest of the cluster will continue to respond to queries, although the client needs to handle the possibility of being redirected to a failing node.

Then I can replace the broken node by running:

redis-trib.rb call 10.0.0.1:6379 cluster forget [broken_node_id]
redis-trib.rb add-node 10.0.0.5:6379 10.0.0.1:6379
redis-trib.rb fix 10.0.0.1:6379

Where 10.0.0.5:6379 is the node that will replace the broken one.

Upvotes: 1

Related Questions