Reputation: 2071
I have a 6 node Redis cluster up and running with, as you would expect, 3 slaves and 3 masters.
From a Redis server point of view everything seems hunky dory, and I can call cluster failover
or debug segfault
on a server and the appropriate slave becomes the master.
From the .net side of things, I have been following the StackExchange Redis documentation, so I have a static IConnectionMultiplexer
, from which I get an IDatabase
, and from there I can store and retrieve values.
I am using a connection string like: srv1:7001,srv2:7001,srv3:7001,srv1:7002,srv2:7002,srv3:7002
So far, so good.
However, I am trying to figure out how to handle a master failure in the cluster when using the connection multiplexer. At the moment, the best I can come up with is to catch an exception, discard the current connection multiplexer and create a whole new one, which feels a bit icky.
Given that I tell the multiplexer all the potential endpoints when I connect, I was expecting it to keep an eye on anything like this and start talking to the new master automatically. Some of the documentation alludes to this too saying:
Likewise, when the configuration is changed (especially the master/slave configuration), it will be important for connected instances to make themselves aware of the new situation (via INFO, CONFIG, etc - where available). StackExchange.Redis does this by automatically...
If I kill off srv1:7001
and do nothing, then the connection never recovers, and every call to set a new value throws up a RedisConnectionException
.
I have tried to attach to events ConfigurationChanged
, ConfigurationChangedBroadcast
, ConnectionFailed
and subscribing to various channels with the hope of seeing a broadcast when the master goes down. None of these seem to trigger if I cause a master to change in the cluster.
So I am wondering if there is something I am missing with regards to this?
Cheers,
Craig.
Upvotes: 1
Views: 2183
Reputation: 2071
After some playing around I noticed that the multiplexer did reconfigure itself eventually. Initially I noticed this while I was stopped in debug mode checking something, and when I carried on, it had unexpectedly started working again.
If I set configCheckSeconds=1
in the connection string, rather than the default of 60, the reconfiguration takes place much more promptly so I am assuming this is the culprit.
I'm not sure how much of an overhead changing this setting so dramatically will have. I guess in general usage a cluster node failing is probably fairly unlikely and so it is not necessary to reconfigure very often. I've just created an extreme scenario with testing.
Upvotes: 1