Reputation: 46392
Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D).
The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as
Some of these lists are user-specific, others are common to a group of users, others are global.
All these lists may change anytime and we never want to serve stale data (the Cache-Control
and Expires
HTTP header are of no direct use here).
We're using 304 NOT MODIFIED
, which helps in case when nothing has changed.
When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts.
We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage
or alike is by far not as good as I explained in my linked question).
An important property of our lists is that every item has a unique id
and a last modified timestamp
.
The timestamp
allows us to compute the delta easily by finding the items that have changed recently.
The id
allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item>
).
This wouldn't work for deletions, but let's ignore them for now.
I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists:
WEEK
This is the base list containing all items as they existed at some arbitrary time in the current week.
DAY
A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day.
Items changed today may or may not be included.
CURRENT
A list containing all items which have changed today as they exist just now.
The client gets all three lists. It starts with WEEK
, applies DAY
(i.e., inserts new items and replaces old ones) and finally applies CURRENT
.
Let's assume there are 1000 items in the list with 10 items changing per day.
The WEEK
list contains all 1000 items, but it can be cached until the end of the week.
Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds).
This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too.
The DAY
list contains up to 70 items and can be cached until the end of a day.
The CURRENT
list contains up to 10 items and can only be cached until anything changes.
The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like
GET /api/order/123 // get the whole list with up to date content
will be replaced by three requests like
GET /api/0,order/123 // get the WEEK list
GET /api/1,order/123 // get the DAY list
GET /api/2,order/123 // get the CURRENT list
Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts?
Do you see any other problems with this idea?
Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example).
Any improvements?
Upvotes: 5
Views: 203
Reputation: 12310
I assume you understand the following general problems with your approach:
localStorage
approach, this falls a bit on the “clever” side, with implications for long-term maintainability. Clear docs and a test suite are a must.Assuming this, I like your approach.
There’s one thing I might change. It adds a bit of flexibility, but also a bit of complexity. It may or may not be a good idea.
Instead of hardcoding three URLs on the client, you could send explicit hyperlinks in response headers. Here’s how it might work:
The client requests a hardcoded “entry point”:
> GET /api/order/123?delta=all
< 200 OK
< Cache-Control: max-age=604800
< Delta-Location: /api/order/123?delta=604800
<
< [...your WEEK list...]
Seeing the Delta-Location
header, the client then requests it and applies the resulting delta:
> GET /api/order/123?delta=604800
< 200 OK
< Cache-Control: max-age=86400
< Delta-Location: /api/order/123?delta=86400
<
< [...your DAY list...]
And so on, until the response has no Delta-Location
.
This allows the server to change the delta structure unilaterally at any time. (Of course, it still has to support the old structure for as long as it may be cached on the clients.)
In particular, this lets you solve the bursts problem. After performing a mass change, you could start serving much smaller deltas (with correspondingly smaller max-age
), such that they exclude the mass change. Then you would gradually increase the delta sizes as time goes by. This would involve extra logic/configuration on the server side, but I’m sure you can figure it out if the bursts are a real concern for you.
Ideally you would resolve Delta-Location
against the request URL, so it behaves like the standard Location
and Content-Location
headers, for uniformity and flexibility. One way to do that in JavaScript is the URL
object.
Further things you could tweak in this hyperlinks approach:
max-age
slightly smaller than delta
, to account for network delays.Link
header instead of a non-standard Delta-Location
. But you’d still need a non-standard relation type, so it’s not clear what this would buy you.Upvotes: 2
Reputation: 48357
Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance.
All these lists may change anytime and we never want to serve stale data
So you should be using long cache times and cache-busting urls.
We're using 304 NOT MODIFIED
That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost.
I would also question a lot of the assumptions made in your original question.
Upvotes: 2