Reputation: 1303
Sorry for a long post, but I hope it would relieve us from some of clarifying questions. I also added some diagrams to split the wall of text, hope you'll like those.
We are in the process of moving our current solution to local Kubernetes infrastructure, and the current thing we investigate is the proper way to setup a KV-store (we've been using Redis for this) in the K8s.
One of the main use-cases for the store is providing processes with exclusive ownership for resources via a simple version of a Distibuted lock pattern, as in (discouraged) pattern here. (More on why we are not using Redlock below).
And once again, we are looking for a way to set it in the K8s, so that details of HA setup are opaque to clients. Ideally, the setup would look like this:
So what is the proper way to setup Redis for this? Here are the options that we considered:
First of all, we discarded Redis cluster, because we don't need sharding of keyspace. Our keyspace is rather small.
Next, we discarded Redis Sentinel setup, because with sentinels clients are expected to be able to connect to chosen Redis node, so we would have to expose all nodes. And also will have to provide some identity for each node (like distinct ports, etc) which contradicts with idea of a K8s Service. And even worse, we'll have to check that all (heterogeneous) clients do support Sentinel protocol and properly implement all that fiddling.
Somewhere around here we got out of options for the first time. We thought about using regular Redis replication, but without Sentinel it's unclear how to set things up for fault-tolerance in case of master failure — there seem to be no auto-promotion for replicas, and no (easy) way to tell K8s that master has been changed — except maybe for inventing a custom K8s operator, but we are not that desperate (yet).
So, here we came to idea that Redis may be not very cloud-friendly, and started looking for alternatives. And so we found KeyDB, which has promising additional modes. That's besides impressing performance boost while having 100% compatible API — very impressive!
So here are the options that we considered with KeyDB:
This setup looks very promising at first — simple, clear, and even official KeyDB docs recommend this as a preferred HA setup, superior to Sentinel setup.
But there's a caveat. While the docs advocate this setup to be tolerant to split-brains (because the nodes would catch up one to another after connectivity is re-established), this would ruin our use-case, because two clients would be able to lock same resource id:
And there's no way to tell K8s that one node is OK, and another is unhealthy, because both nodes have lost their replicas.
Ok, things got more complicated, but it seems that the setup is brain-split proof:
Note that we had to add more stuff here:
WAIT 1
command for SET
/EXPIRE
— to ensure that we are writing to a healthy split (preventing case when client connects to unhealthy node before load balancer learns it's ill).And this is when a sudden thought struck: what's about consistency?? Both these setups with multiple writable nodes provide no guard against two clients both locking same key on different nodes!
Redis and KeyDB both have asynchronous replication, so there seem to be no warranty that if an (exclusive) SET
succeeds as a command, it would not get overwritten by another SET
with same key issued on another master a split-second later.
Adding WAIT
s does not help here, because it only covers spreading information from master to replicas, and seem to have no affect on these overlapping waves of overwrites spreading from multiple masters.
Okay now, this is actually the Distributed Lock problem, and both Redis and KeyDB provide the same answer — use the Redlock algorithm. But it seem to be quite too complex:
So, what options do we have? Both Redlock explanations do start from a single-node version, which is OK, if the node will never die and is always available. And while it's surely not the case, but we are willing to accept the problems that are explained in the section "Why failover-based implementations are not enough" — because we believe failovers would be quite rare, and we think that we fall under this clause:
Sometimes it is perfectly fine that under special circumstances, like during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution.
So, having said all of this, let me finally get to the question: how do I setup a fault-tolerant "replication-based solution" of KeyDB to work in Kubernetes, and having a single write node most of the time?
Also, what would restore a previously dead master node in such a way that it would not become a master again, but a replica of a substitute master?
Do we need some K8s operator for this? (Those that I found were not smart enough to do this).
And this is where I'd like to ask for your help!
I've found frustratingly little info on the topic. And it does not seem that many people have such problems that we face. What are we doing wrong? How do you cope with Redis in the cloud?
Upvotes: 4
Views: 1517
Reputation: 2156
We've just switched to KeyDB for the same reasons you mention.
We've put HAProxy between our app and KeyDB to route all traffic to the "first" online node. We can't share the reads traffic, but it seems to have fixed the lock issue (without using RedLock) and should give us HA in case of an issue with a KeyDB pod.
Here's a snippet of our HAProxy config:
listen keydb
bind *:6379
mode tcp
balance first
option tcp-check
tcp-check connect
tcp-check send "AUTH default $REDIS_PASSWORD\r\n"
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:active-replica
tcp-check send QUIT\r\n
tcp-check expect string +OK
server-template srv 5 our-app-keydb-headless.our-namespace.svc.cluster.local:6379 resolvers k8sdnspolicy check inter 1s init-addr none
# resolvers is based upon https://stackoverflow.com/a/76333027/1178671 to help with ES pods being moved on upgrades, etc
# Due to how this works, we must use FQDN for es servers above
resolvers k8sdnspolicy
parse-resolv-conf
hold valid 30s
We're using this KeyDB helm chart https://artifacthub.io/packages/helm/enapter/keydb
I hope this helps :)
Upvotes: 1