memcache and elasticache autoscaling

As far as I know if you implement autoscaling with memcached nodes and use some sort of dynamic trigger to include those new nodes in your app then you essentially invalidate the cache in doing so as you change the hash algorithm to assign shards. So if this is the case then load based autoscaling for memcached isn't a good idea. Is this correct?

Does AWS Elasticache with auto discovery have some sort of smarts to stop this happening, as it supports adding nodes also and connects via a single IP? As far as I can see the answer is no, as it's essentially just altering configuration dynamically based on the server list in the discovery record, and thus will suffer the same problem, but hopefully someone more in the know than me can say either way.

For background, I'm looking at AWS Opsworks and wondering whether to use Elasticache or a memcached layer.

Upvotes: 1

Answers (1)

Jamey

Reputation: 1633

Note that Auto Discovery is not based on a single IP as stated in the question. The IP can change over time. When you query the configuration endpoint, Elasticache routes to the request to a healthy node in the cluster.

Now on to your question...

Whether or not auto scaling "is a good idea" could depend partially on whether or not your app can tolerate key remapping.

And whether or not your keys remap when the cluster changes really depends on how you use your memcached client -- not on the AWS Elasticache service itself.

For memcached client libraries I've seen, I believe you're assumption is correct. If you add or remove nodes from the cluster and you're using an Auto Discovery client, the keys will remap.

If you absolutely must have Auto Scaling and absolutely must avoid key remapping, there is a work-around.

Don't use Auto Discovery. Instead, configure your client to always use the maximum number of nodes -- regardless of how many are running. The names of the nodes in your memcached Elasticache cluster are predictable. For example, if your auto scaling strategy will allow for a max of 5 nodes, but your normally run with 2, go ahead and tell your cache client that the nodes are mc.xxxxxx.0001.use1.cache.amazonaws.com, mc.xxxxxx.0002.use1.cache.amazonaws.com, mc.xxxxxx.0003.use1.cache.amazonaws.com, mc.xxxxxx.0004.use1.cache.amazonaws.com, mc.xxxxxx.0005.use1.cache.amazonaws.com
Use a client that supports a node location algorithm that minimizes key remapping such as the Ketama Node Locater that was developed by last.fm to combat this problem.

Downsides to this approach are: 1. During normal run times, your cache client is continually executing dead node checks for nodes that don't exist. 2. After a scale up event, the nodes used in your normal run configuration will still host most of the keys unless you do a little more work. For example, you could key shard and base the number of shards on the number of current nodes.

That starts to get complicated.

What I've done in practice instead is manually handle scaling Elasticache and schedule the scaling events to occur during non-peak traffic times.

Upvotes: 2

memcache and elasticache autoscaling

Answers (1)

Related Questions