Computing an ETag for a REST API

Question

We're building REST APIs in which we use ETag for two uses:

Save bandwidth by allowing the client to avoid reloading a resource (not that important to us)
Address concurrency issues (lost update problem)

From a practical perspective, I'm wondering what to use to compute the ETag.

Item hash

We're using a hash of the (json dump of the) item object sent in the response. This works fine. It is easy to check on a PUT request: pull the item from DB, compute hash, compare. However, it makes the separation of concerns a bit "leaky": the layer that builds the response from the item is sort of interleaved with the layer responsible for ETag computation. Besides, additional data (response headers) may matter and if they do, sending a 304 just because the item itself didn't change while headers did might not be appropriate.
Response hash

Another approach would be to just hash the response before sending it. Doing this makes the ETag layer much cleaner for the computation part. However, on a PUT request, we can't just pull the item from DB to check the ETag as we don't have the extra data.

The first approach (compute item hash) seems appropriate for case 2 concurrency issues. The second approach (compute payload hash, including metadata, headers) would be appropriate for case 1 save bandwidth.

Putting every bit of the response (including headers) in the request seems right, as every change there may be relevant and require the client to refresh its cache. But I don't know how to manage concurrency on PUT or DELETE requests with such an ETag.

From a practical perspective, should we use item hash or response hash and how can we handle both cases with one of them?

Kevin Christopher Henry · Accepted Answer

Given your description I think the response hash is the only one that makes sense here.

First, in order to use conditional requests to avoid the lost update problem, the validators need to be strong.

An origin server MUST use the strong comparison function when comparing entity-tags for If-Match (Section 2.3.2), since the client intends this precondition to prevent the method from being applied if there have been any changes to the representation data.

Strong validators can only have the same value when the representations are bit-for-bit identical. But if, as you say, "additional data may matter" beyond the item hash, then you are not in a position to decide on a strong ETag at that time. So you simply could not do an item hash and be consistent with the specification in that case.

Of course, you could decide that additional data does not matter, in which case you could still do the item hash and be consistent with the specification. But that obviates the one downside you gave for the response hash idea ("we can't just pull the item from DB to check the ETag as we don't have the extra data").

Put differently: you need a strong ETag to avoid lost updates, and strong validators must change "whenever a change occurs to the representation data that would be observable in the payload body of a 200 (OK) response to GET." So to construct the ETag you have to know everything you would know to respond to a GET in any case, so there's no downside to doing it in the response layer.

Computing an ETag for a REST API

Answers (1)

Related Questions