Reputation: 21
I am trying to understand which would be the best cache strategy in case of a REST API layer that allows to query and update a customer registry database. We currently have 3 fronted servers all speaking with a central database server.
The idea would be to return an etag to the calling client with the etag matching the customer record version id (an hash value that is updated at any change on the account) with update calls accepted only if the received etag matches the version id stored on the database.
Let's suppose that a client performs a GET for a customer record being routed to Server 1 by the load balancer. Server 1 does not have the customer record cached so will query the database, cache the record locally and return the record as response of the call including the etag header.
If a second client arrives and perform the same GET for the same customer record being routed to Server 2, the Server 2 will also cache the entry locally and return the same etag header back.
Let's assume that now the first client has performed an update call against the same record via Server 1. Server 1 cache gets updated with the latest record details and the first client gets back a new etag.
After this, the second client performs a conditional get call providing the "If-None-Match" header set with the received etag. The request will hit again the Server 2. My assumption is that the Server 2 will still have cached the old etag and will return a 304 Not Modified response to the client. Is this a correct assumption?
With this situation, a client would get stale data easily and would impact the overall consistency of the data seen and used on client side.
What would be needed to solve this and ensure that no stale customer record data are returned to clients at any time?
Thanks a lot!
Upvotes: 2
Views: 1645
Reputation: 12948
As in this article published by Google, you can use hierarchical caching to solve cache-invalidation problem in certain situation, specially for static assets.
Naming assets based on it's finger-print (appending) and making the top layer non-cacheable is the basic idea here.
Upvotes: 0
Reputation: 4554
+one more solution to @David's list:
Possible clustered cache implementations are: couchbase, redis cluster. The most popular non-clustered implementation is memcached.
Upvotes: 1
Reputation: 34563
Cache invalidation is a hard problem to solve. There are at least 3 ways I've seen to solve this. They vary by complexity and by how long an expired record is still deemed valid.
The simplest answer is that all front-end servers must call the database to verify the etag before returning "304 Not Modified". This might be best if there are many updates or the cost of downloading a record from the database is high.
If it sometimes okay to send back an old value, then you can set an expiration time on your cached items.
The other option is that when 1 front-end server sees an update, it needs to tell the other front-end servers to expire that cached item (maybe by calling a webservice?). This allows for long cache durations but might be too chatty if there are lots of updates.
Upvotes: 3