Craig H
Craig H

Reputation: 2071

How to handle failover in a Redis cluster with ConnectionMultiplexer?

I have a 6 node Redis cluster up and running with, as you would expect, 3 slaves and 3 masters.
From a Redis server point of view everything seems hunky dory, and I can call cluster failover or debug segfault on a server and the appropriate slave becomes the master.

From the .net side of things, I have been following the StackExchange Redis documentation, so I have a static IConnectionMultiplexer, from which I get an IDatabase, and from there I can store and retrieve values.

I am using a connection string like: srv1:7001,srv2:7001,srv3:7001,srv1:7002,srv2:7002,srv3:7002

So far, so good.

However, I am trying to figure out how to handle a master failure in the cluster when using the connection multiplexer. At the moment, the best I can come up with is to catch an exception, discard the current connection multiplexer and create a whole new one, which feels a bit icky.

Given that I tell the multiplexer all the potential endpoints when I connect, I was expecting it to keep an eye on anything like this and start talking to the new master automatically. Some of the documentation alludes to this too saying:

Likewise, when the configuration is changed (especially the master/slave configuration), it will be important for connected instances to make themselves aware of the new situation (via INFO, CONFIG, etc - where available). StackExchange.Redis does this by automatically...

If I kill off srv1:7001 and do nothing, then the connection never recovers, and every call to set a new value throws up a RedisConnectionException.

I have tried to attach to events ConfigurationChanged, ConfigurationChangedBroadcast, ConnectionFailed and subscribing to various channels with the hope of seeing a broadcast when the master goes down. None of these seem to trigger if I cause a master to change in the cluster.

So I am wondering if there is something I am missing with regards to this?

Cheers,
Craig.

Upvotes: 1

Views: 2183

Answers (1)

Craig H
Craig H

Reputation: 2071

After some playing around I noticed that the multiplexer did reconfigure itself eventually. Initially I noticed this while I was stopped in debug mode checking something, and when I carried on, it had unexpectedly started working again.

If I set configCheckSeconds=1 in the connection string, rather than the default of 60, the reconfiguration takes place much more promptly so I am assuming this is the culprit.

I'm not sure how much of an overhead changing this setting so dramatically will have. I guess in general usage a cluster node failing is probably fairly unlikely and so it is not necessary to reconfigure very often. I've just created an extreme scenario with testing.

Upvotes: 1

Related Questions