Reputation: 47

REST API: How to prevent races when doing read - update - write cycle on server side

Let's assume a relational db-backed API. I understand the usage of optimistic concurrency control (eg by using a version field) to prevent lost updates from clients (where the r/u/w cycle is performed by the client). But let's imagine a PATCH request sent by a client; the client need not have previous knowledge of the resource (or minimal, eg only the ID) and just sends the data to update (eg in JSON patch format, or whatever).

For the purposes of this discussion, let's imagine that the resource has a field that is updated according to some business logic rules, for example it's a set of fingerprints, and each client should be able to add its own fingerprint to the set, without deleting the existing ones (it's more complex than this, but this should provide an idea). So the order in which the requests are evaluated is not critical, but it's definitely critical that no request be lost, otherwise some client would not have its fingerprint in the set.

When it receives the request, to apply the patch the server must read the resource from the backend (to see whether it exists and to also fetch any values that need to be merged/patched), apply the requested changes, and write back the result, all on the server side without no client involvement. It's not difficult to see that such a sequence of operations is racy and can easily lead to lost updates if multiple clients send PATCH for the same resource and fields at the same time.

How to prevent this? I can think of a couple of strategies:

use pessimistic concurrency control (ie, real db locking with SELECT .. FOR UPDATE) on the resource throughout the operation
use optimistic concurrency control "internally", ie fetch the resource and check that it didn't change in the meantime when writing it back; if it did, either fail the operation right away (the client will have to retry or whatever) or retry with the new version of the resource, and so on (perhaps up to a fixed number of times) until it succeeds.
work around the issue: serialize all these requests by sending them to a queue where they will be processed asynchronously in the same order they were received (obviously this doesn't give any real feedback to the client)

Anything I'm missing here? Any additional strategies? Are these methods used in the real world?

Upvotes: -1

Answers (2)

Gary Archer

Reputation: 29291

An answer from me based on a system I used to work with.

BANKING USE CASE

We provided a system where traders at banks could edit order details in an editable grid. These edits might occur concurrently:

Banker 1 edits fields A, B and D for an order
Banker 2 edits fields B, C and E for the same order

We designed optimistic concurrency at the field level so that:

Fields A, C, D and E would be updated correctly
Field B might be banker 1's value or might be banker 2's value

We also added a Last Changed By field that both users could see. So if banker 1 set a value of $2000 and then saw a value of $3000 they would also see who set that value. The two bankers could talk to each other to reconcile the correct value.

LESSONS LEARNED

Database technologies sometimes want to update all fields in a row, which would lose data. To prevent that, ensure that you only save changed fields in SQL update statements. To achieve that you typically need to compare existing to received data in APIs.

The UI design and people interaction can play a significant part in data designs. For example, editable grids can add a lot of complexity, so use them sparingly.

Upvotes: 0

VoiceOfUnreason

Reputation: 57307

Anything I'm missing here? Any additional strategies?

Does YOLO count as an additional strategy?

As far as I know, if you have requests being handled in parallel, with no coordination between them, and you want something like "first writer wins" semantics with no surprises, then in the general case you need locking or you need compare-and-swap semantics provided by your persistent storage.

(There's nothing particularly HTTP/REST about this constraint, it's just a physics of the world thing.)

In cases where the order that you write things down isn't critical, you might be able to get away without the locking. Conflict-free replicated data types or cases where you can treat your information as a set of changes rather than a sequence of changes, and anyone looking at the same set of changes will converge on the same "state".

Upvotes: 0

REST API: How to prevent races when doing read - update - write cycle on server side

Answers (2)

Related Questions