Reputation: 325
Suppose I have a redis cluster with nodes 10.0.0.1, 10.0.0.2, 10.0.0.3 and 10.0.0.4, which I'm using as a cache.
Then, for whatever reason, node 10.0.0.4 fails and goes down. This brings down the entire cluster:
2713:M 13 Apr 21:07:52.415 * FAIL message received from [id1] about [id2]
2713:M 13 Apr 21:07:52.415 # Cluster state changed: fail
Which causes any query to be shut down with "CLUSTERDOWN The cluster is down".
However, since I'm using the cluster as a cache, I don't really care if a node goes down. A key can get resharded to a different node and lose its contents without affecting my application.
Is there a way to set up such an automated resharding?
Upvotes: 4
Views: 3084
Reputation: 647
By assuming you have only master nodes in your current cluster, you will definitely get cluster down error because there is no replica of down master and Redis thinks cluster is not in safe and triggers an error.
Solution
Join that node to cluster
redis-trib.rb add-node 127.0.0.1:6379 EXISTING_MASTER_IP:EXISTING_MASTER_PORT
Make node slave of 10.0.0.4
redis-cli -p 6379 cluster replicate NODE_ID_OF_TARGET_MASTER
To Test
First be sure, cluster is in good shape.(All slots are covered and nodes are agreed about configurations.)
redis-trib.rb check 127.0.0.1:6379 (On any master)
Kill process of 10.0.0.4
Check cluster and be sure all slots are moved new master
redis-trib.rb check 127.0.0.1:6379 (On any master)
No manual actions needed. Additionally, if you have more slaves in cluster they may be promoted as new masters of other masters as well. (e.g. You have a setup of 3 master, 3 slaves. Master1 goes down, Slave1 becomes new master. Slave1 goes down, Slave1 can be new master as Master1.)
Upvotes: 0
Reputation: 325
I found something close enough to what I need.
By setting cluster-require-full-coverage
to "no", the rest of the cluster will continue to respond to queries, although the client needs to handle the possibility of being redirected to a failing node.
Then I can replace the broken node by running:
redis-trib.rb call 10.0.0.1:6379 cluster forget [broken_node_id]
redis-trib.rb add-node 10.0.0.5:6379 10.0.0.1:6379
redis-trib.rb fix 10.0.0.1:6379
Where 10.0.0.5:6379
is the node that will replace the broken one.
Upvotes: 1