Some time ago I asked how to do Incremental updates using browser cache . Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D). The problem The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as the same list can be served to multiple clients and the clients do the filtering and sorting without bothering the server again and again. Some of these lists are user-specific, others are common to a group of users, others are global. All these lists may change anytime and we never want to serve stale data (the Cache-Control and Expires HTTP header are of no direct use here). We're using 304 NOT MODIFIED , which helps in case when nothing has changed. When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts. We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage or alike is by far not as good as I explained in my linked question). An important property of our lists is that every item has a unique id and a last modified timestamp . The timestamp allows us to compute the delta easily by finding the items that have changed recently. The id allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item> ). This wouldn't work for deletions, but let's ignore them for now. The idea I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists: WEEK This is the base list containing all items as they existed at some arbitrary time in the current week . DAY A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day . Items changed today may or may not be included. CURRENT A list containing all items which have changed today as they exist just now . The client gets all three lists. It starts with WEEK , applies DAY (i.e., inserts new items and replaces old ones) and finally applies CURRENT . An example Let's assume there are 1000 items in the list with 10 items changing per day. The WEEK list contains all 1000 items, but it can be cached until the end of the week. Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds). This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too. The DAY list contains up to 70 items and can be cached until the end of a day. The CURRENT list contains up to 10 items and can only be cached until anything changes. The communication The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like GET /api/order/123 // get the whole list with up to date content will be replaced by three requests like GET /api/0,order/123 // get the WEEK list GET /api/1,order/123 // get the DAY list GET /api/2,order/123 // get the CURRENT list The questions Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts? Do you see any other problems with this idea? Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example). Any improvements?

httpcachingclient-serversingle-page-application

Reputation: 46392

A solution idea for incremental updates using browser cache

Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D).

The problem

The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as

the same list can be served to multiple clients
and the clients do the filtering and sorting without bothering the server again and again.

Some of these lists are user-specific, others are common to a group of users, others are global. All these lists may change anytime and we never want to serve stale data (the Cache-Control and Expires HTTP header are of no direct use here).

We're using 304 NOT MODIFIED, which helps in case when nothing has changed. When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts. We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage or alike is by far not as good as I explained in my linked question).

An important property of our lists is that every item has a unique id and a last modified timestamp. The timestamp allows us to compute the delta easily by finding the items that have changed recently. The id allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item>). This wouldn't work for deletions, but let's ignore them for now.

The idea

I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists:

WEEK This is the base list containing all items as they existed at some arbitrary time in the current week.
DAY A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day. Items changed today may or may not be included.
CURRENT A list containing all items which have changed today as they exist just now.

The client gets all three lists. It starts with WEEK, applies DAY (i.e., inserts new items and replaces old ones) and finally applies CURRENT.

An example

Let's assume there are 1000 items in the list with 10 items changing per day.

The WEEK list contains all 1000 items, but it can be cached until the end of the week. Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds). This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too.

The DAY list contains up to 70 items and can be cached until the end of a day.

The CURRENT list contains up to 10 items and can only be cached until anything changes.

The communication

The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like

GET /api/order/123      // get the whole list with up to date content

will be replaced by three requests like

GET /api/0,order/123    // get the WEEK list
GET /api/1,order/123    // get the DAY list
GET /api/2,order/123    // get the CURRENT list

The questions

Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts?

Do you see any other problems with this idea?

Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example).

Any improvements?

Upvotes: 5

Answers (2)

Vasiliy Faronov

Reputation: 12310

I assume you understand the following general problems with your approach:

Compared to the “one big list + 304” approach, this reduces network traffic, but increases client processing time: your client code still sees the same responses on a warm cache as on a cold cache, but now there are three of them, with overlapping data.
Compared to the localStorage approach, this falls a bit on the “clever” side, with implications for long-term maintainability. Clear docs and a test suite are a must.

Assuming this, I like your approach.

There’s one thing I might change. It adds a bit of flexibility, but also a bit of complexity. It may or may not be a good idea.

Instead of hardcoding three URLs on the client, you could send explicit hyperlinks in response headers. Here’s how it might work:

The client requests a hardcoded “entry point”:

> GET /api/order/123?delta=all

< 200 OK
< Cache-Control: max-age=604800
< Delta-Location: /api/order/123?delta=604800
<
< [...your WEEK list...]

Seeing the Delta-Location header, the client then requests it and applies the resulting delta:

> GET /api/order/123?delta=604800

< 200 OK
< Cache-Control: max-age=86400
< Delta-Location: /api/order/123?delta=86400
<
< [...your DAY list...]

And so on, until the response has no Delta-Location.

This allows the server to change the delta structure unilaterally at any time. (Of course, it still has to support the old structure for as long as it may be cached on the clients.)

In particular, this lets you solve the bursts problem. After performing a mass change, you could start serving much smaller deltas (with correspondingly smaller max-age), such that they exclude the mass change. Then you would gradually increase the delta sizes as time goes by. This would involve extra logic/configuration on the server side, but I’m sure you can figure it out if the bursts are a real concern for you.

Ideally you would resolve Delta-Location against the request URL, so it behaves like the standard Location and Content-Location headers, for uniformity and flexibility. One way to do that in JavaScript is the URL object.

Further things you could tweak in this hyperlinks approach:

You should probably make max-age slightly smaller than delta, to account for network delays.
You might need extra logic on the client to avoid an endless loop if the server (erroneously) links back to a previous delta.
You could use the standard Link header instead of a non-standard Delta-Location. But you’d still need a non-standard relation type, so it’s not clear what this would buy you.

Upvotes: 2

symcbean

Reputation: 48357

Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance.

All these lists may change anytime and we never want to serve stale data

So you should be using long cache times and cache-busting urls.

We're using 304 NOT MODIFIED

That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost.

I would also question a lot of the assumptions made in your original question.