RTF
RTF

Reputation: 6494

Checking if HTTP resource has changed after maximum cache time has expired

I'm trying to work out a new caching policy for the static resources on a website. A common problem is whenever javascript, CSS etc. is updated, many users hold onto stale versions because currently there are no caching specific HTTP headers included in the file responses.

This becomes a serious problem when, for example, the javascript updates are linked to server-side updates, and the stale javascript chokes on the new server responses.

Eliminating browser caching completely with a cache-control: max-age=0, no-cache seems like overkill, since I'd still like to take some pressure off the server by letting browsers cache temporarily. So, setting the cache policy to a maximum of one hour seems alright, like cache-control: max-age=3600, no-cache.

My understanding is that this will always fetch a new copy of the resource if the cached copy is older than one hour. I'd specifically like to know if it's possible to set a HTTP header or combination of headers that will instruct browsers to only fetch a new copy if the resource was last checked more than one hour ago AND if the resource has changed.

I'm just trying to avoid browsers blindly fetching new copies just because the cached resource is older than one hour, so I'd also like to add the condition that the resource has been changed.

Just to illustrate further what I'm asking:

  1. New user arrives at site and gets fresh copy of script.js
  2. User stays on site for 45 mins, browser uses cached copy of script.js all the time
  3. User comes back to site 2 hours later, and browser asks the server if script.js has changed
  4. If it has, then it gets a fresh copy and the process repeats
  5. If it has not changed, then it uses the cached copy for the next hour, after which it will check again

Have I misunderstood things? Is what I'm asking how it actually works, or do I have to do something different?

Upvotes: 2

Views: 3390

Answers (2)

Fraser
Fraser

Reputation: 17039

Have I misunderstood things? Is what I'm asking how it actually works, or do I have to do something different?

You have some serious misconceptions about what the various cache control directives do and why cache behaves as it does.

Eliminating browser caching completely with a cache-control: max-age=0, no-cache seems like overkill, since I'd still like to take some pressure off the server by letting browsers cache temporarily ... The no-cache option is wrong too. Including it means the browser will always check with the server for modifications to the file every time.

That isn't what the no-cache means or what it is intended for - it means that a client MUST NOT used a cached copy to satisfy a subsequent request without successful revalidation - it does not and has never meant "do not cache" - that is what the no-store directive is for

Also the max-age directive is just the primary means for caches to calculate the freshness lifetime and expiration time of cached entries. The Expires header (minus the value of the Date header can also be used) - as can a heuristic based on the current UTC time and any Last-Modified header value.

Really if your goal is to retain the cached copy of a resource for as long as it is meaningful - whilst minimising requests and responses you have a number of options.

  1. The Etag (Entity Tag) header - this is supplied by the server in response to a request in either a "strong" or "weak" form. It is usually a hash based on the resource in question. When a client re-requests a resource it can pass the stored value of the Etag with the If-None-Match request header. If the resource has not changed then the server will respond with 304 Not Modified.

    You can think Etags as fingerprints for resources. They can be used to massively reduce the amount of information sent over the wire - as only fresh data is served - but they do not have any bearing on the number of times or frequency of requests.

  2. The last-modified header - this is supplied by the server in response to a request in HTTPdate format - it tells the client the last time the resource was modified. When a client re-requests a resource it can pass the stored value of the last-modified header with the If-Modified-Since request header. If the resource has not changed since the time it was last modified then the server will respond with 304 Not Modified.

    You can think of last modified as a weaker form of entity checking than Etags. It addresses the same problem (bandwidth/redundancy) it in a less robust way and again it has no bearing at all on the actual number of requests made.

  3. Revving - a technique that use a combination of the Expires header and the name (URN) of a resource. (see stevesouders blog post)

    Here one basically sets a far forward Expires header - say 5 years from now - to ensure the static resource is cached for a long time.

    You then have have two options for updating - either by appending a versioning query string to the requests URL - e.g. "/mystyles.css?v=1.1" - and updating the version number as and when the resource changes. Or better - versioning the file name itself e.g. "/mystyles.v1.1.css" so that each version is cached for as long as possible.

    This way not only do you reduce the amount of bandwidth - you will as eliminate all checks to see if the resource has changed until you rename it.

I suppose the main point here is none of the catch control directives you mention max-age, public, etc have any bearing at all on if a 304 response is generated or not. For that use either Etag / If-None-Match or last-modified / If-Modified-Since or a combination of them (with If-Modified-Since as a fallback mechanism to If-None-Match).

Upvotes: 3

RTF
RTF

Reputation: 6494

It seems that I have misunderstood how it works, because some testing in Chrome has revealed exactly the behavior that I was looking for in the 5 steps I mentioned.

It doesn't blindly grab a fresh copy from the server when the max-age has expired. It does a GET, and if the response is 304 (Not Modified), it continues using the cached copy until the next hour has expired, at which point it checks for changes again etc.

The no-cache option is wrong too. Including it means the browser will always check with the server for modifications to the file every time. So what I was really looking for is:

Cache-Control: public, max-age=3600 

Upvotes: 0

Related Questions