AWS ElastiCache Changed Primary node for Redis cluster

Question

I'm working on an API that is using Redis which is hosted on ElastiCache with three nodes (one primary, two replicas). For some reason over the weekend, the primary was switched to node 002 (from 001) which caused READONLY errors for my application when it tried to send messages to Redis.

Is there any reason why this should ever happen without doing it manually?

James Thorpe · Accepted Answer

We had some alerts that they've been doing rolling replacement work on the redis clusters over the last few weeks. I guess this hit your cluster this weekend.

As for the roles of nodes swapping - this is part of the managed service of ElastiCache - you can and should expect the roles of nodes to change while they perform this behind-the-scenes maintenance work. If the primary node died entirely, you'd expect the replica to take over, and when a new replacement came up for the dead primary, it would then become a replica of the new primary.

Having had this work done on a few of ours, I just double checked two of them - the primary has switched on one cluster, but not on another - the client side code needs to take this possibility into account.

AWS ElastiCache Changed Primary node for Redis cluster

Answers (1)

Related Questions