Reputation: 47
Let's assume a relational db-backed API. I understand the usage of optimistic concurrency control (eg by using a version field) to prevent lost updates from clients (where the r/u/w cycle is performed by the client). But let's imagine a PATCH request sent by a client; the client need not have previous knowledge of the resource (or minimal, eg only the ID) and just sends the data to update (eg in JSON patch format, or whatever).
For the purposes of this discussion, let's imagine that the resource has a field that is updated according to some business logic rules, for example it's a set of fingerprints, and each client should be able to add its own fingerprint to the set, without deleting the existing ones (it's more complex than this, but this should provide an idea). So the order in which the requests are evaluated is not critical, but it's definitely critical that no request be lost, otherwise some client would not have its fingerprint in the set.
When it receives the request, to apply the patch the server must read the resource from the backend (to see whether it exists and to also fetch any values that need to be merged/patched), apply the requested changes, and write back the result, all on the server side without no client involvement. It's not difficult to see that such a sequence of operations is racy and can easily lead to lost updates if multiple clients send PATCH for the same resource and fields at the same time.
How to prevent this? I can think of a couple of strategies:
SELECT .. FOR UPDATE
) on the resource throughout the operationAnything I'm missing here? Any additional strategies? Are these methods used in the real world?
Upvotes: -1
Views: 155
Reputation: 29291
An answer from me based on a system I used to work with.
BANKING USE CASE
We provided a system where traders at banks could edit order details in an editable grid. These edits might occur concurrently:
We designed optimistic concurrency at the field level so that:
We also added a Last Changed By
field that both users could see. So if banker 1 set a value of $2000 and then saw a value of $3000 they would also see who set that value. The two bankers could talk to each other to reconcile the correct value.
LESSONS LEARNED
Database technologies sometimes want to update all fields in a row, which would lose data. To prevent that, ensure that you only save changed fields in SQL update statements. To achieve that you typically need to compare existing to received data in APIs.
The UI design and people interaction can play a significant part in data designs. For example, editable grids can add a lot of complexity, so use them sparingly.
Upvotes: 0
Reputation: 57307
Anything I'm missing here? Any additional strategies?
Does YOLO count as an additional strategy?
As far as I know, if you have requests being handled in parallel, with no coordination between them, and you want something like "first writer wins" semantics with no surprises, then in the general case you need locking or you need compare-and-swap semantics provided by your persistent storage.
(There's nothing particularly HTTP/REST about this constraint, it's just a physics of the world thing.)
In cases where the order that you write things down isn't critical, you might be able to get away without the locking. Conflict-free replicated data types or cases where you can treat your information as a set of changes rather than a sequence of changes, and anyone looking at the same set of changes will converge on the same "state".
Upvotes: 0