databaseweb-applicationsconcurrencyrace-condition

Reputation: 1693

How do I deal with concurrent changes in a web application?

Here are two potential workflows I would like to perform in a web application.

Variation 1

user sends request
server reads data
server modifies data
server saves modified data

Variation 2:

user sends request
server reads data
server sends data to user
user sends request with modifications
server saves modified data

In each of these cases, I am wondering: what are the standard approaches to ensuring that concurrent access to this service will produce sane results? (i.e. nobody's edit gets clobbered, values correspond to some ordering of the edits, etc.)

The situation is hypothetical, but here are some details of where I would likely need to deal with this in practice:

web application, but language unspecified
potentially, using a web framework
data store is a SQL relational database
the logic involved is too complex to express well in a query e.g. value = value + 1

I feel like I would prefer not to try and reinvent the wheel here. Surely these are well known problems with well known solutions. Please advise.

Thanks.

Upvotes: 9

Answers (4)

Stian Jørgensrud

Reputation: 1044

To answer the question in the title. There is one general solution for dealing with the lost update problem over HTTP.

Let's assume your application consists of two components; a front-end and an API back-end. And there are two concurrent users that tries to perform an update on the same data.

Optimistic concurrency control

Implement some way of knowing if incoming data is based on the latest update. Some common ways:

Using an ETag. This could be a field stored in db, but could also be computed on each update.
Using a lastUpdated timestamp field stored in db
Using a version field stored in db

The ways mentioned above can be combined with the use of conditional HTTP headers. Can be useful if your server framework supports it out of the box.

Example:

User 1 makes a GET request, retrieves data + ETag. User 2 makes the same GET request and retrieves the same data.
User 2 updates the data in a PUT request. Note that ETag is a part of the request.
User 1 makes a PUT request and get error (HTTP statuscode 412). This is because the API back-end compared the ETag (from user 1 PUT request) and the newly computed ETag on data in db (from user 2) and ETags mismatched. Meaning we know user 1 didn't have the latest changes when sending the PUT request.

Other ways to deal with the lost update problem over HTTP

Ignore it

In my short experience, ignoring the lost update problem is a common way to deal with it. Think about the consequences. Ignoring it can be okay if:

Data loss is ok.
Users will never edit the same data. For example, editing a comment on web page is often restricted to the author of the comment.

Examples of systems that ignore the lost update problem:

Jira (Data Center). Tested by editing the description field on issues.

Change the HTTP API

One can change the API so it doesn't have the lost update problem.

@jay has already mentioned delta updates as a solution. Let's say one have a number field in a model that should be incremented by 1 on each request. One implementation is a PUT endpoint which updates the model and field with the incoming number. This API has the lost update problem. Another implementation is to have an increment endpoint. This API doesn't have the lost update problem. (The API is RESTful if you view an increment as a resource and create new increments with POST).

Another way is to change the API from PUT to PATCH. This is not a solution, but it will minimise the possibility for lost updates.

Notify front-end about changes on back-end

Server-sent events (SSE) allows the back-end to notify front-end of changes. SSE is considered built on HTTP (Streaming), though it is stateful.

Use stateful API instead of HTTP

Websocket is an alternative to HTTP. It provides simultaneous two-way communication between front-end and back-end.

Real-time editing

Changes by one user are streamed to all the other users. This solution is common on online collaborative document editors.

Pessimistic concurrency control

Set (pessimistic) database lock on data that is being viewed.

Example: User looks at data. User 2 tries to look at the same data and gets an error. This is because system cannot give out potentially outdated data so the information has been locked by db.

Conclusion

Use optimistic concurrency control.

When Googling "lost update problem" one might only get results about databases. Although this is the same problem, database locking doesn't work over HTTP because HTTP is stateless.

Upvotes: 3

Jay

Reputation: 27512

To the best of my knowledge, there is no general solution to the problem.

The root of the problem is that the user may retrieve data and stare at it on the screen for a long time before making an update and saving.

I know of three basic approaches:

When the user reads the database, lock the record, and don't release until the user saves any updates. In practice, this is wildly impractical. What if the user brings up a screen and then goes to lunch without saving? Or goes home for the day? Or is so frustrated trying to update this stupid record that he quits and never comes back?
Express your updates as deltas rather than destinations. To take the classic example, suppose you have a system that records stock in inventory. Every time there is a sale, you must subtract 1 (or more) from the inventory count.

So say the present quantity on hand is 10. User A creates a sale. Current quantity = 10. User B creates a sale. He also gets current quantity = 10. User A enters that two units are sold. New quantity = 10 - 2 = 8. Save. User B enters one unit sold. New quantity = 10 (the value he loaded) - 1 = 9. Save. Clearly, something went wrong.

Solution: Instead of writing "update inventory set quantity=9 where itemid=12345", write "update inventory set quantity=quantity-1 where itemid=12345". Then let the database queue the updates. This is very different from strategy #1, as the database only has to lock the record long enough to read it, make the update, and write it. It doesn't have to wait while someone stares at the screen.

Of course, this is only useable for changes that can be expressed as a delta. If you are, say, updating the customer's phone number, it's not going to work. (Like, old number is 555-1234. User A says to change it to 555-1235. That's a change of +1. User B says to change it to 555-1243. That's a change of +9. So total change is +10, the customer's new number is 555-1244. :-) ) But in cases like that, "last user to click the enter key wins" is probably the best you can do anyway.

On update, check that relevant fields in the database match your "from" value. For example, say you work for a law firm negotiating contracts for your clients. You have a screen where a user can enter notes about negotiations. User A brings up a contract record. User B brings up the same contract record. User A enters that he just spoke to the other party on the phone and they are agreeable to the proposed terms. User B, who has also been trying to call the other party, enters that they are not responding to phone calls and he suspects they are stonewalling. User A clicks save. Do we want user B's comments to overwrite user A's? Probably not. Instead we display a message indicating that the notes have been changed since he read the record, and allowing him to see the new value before deciding whether to proceed with the save, abort, or enter something different.

[Note: the forum is automatically renumbering my numbered lists. I'm not sure how to override this.]

Upvotes: 10

Bozho

Reputation: 597402

Things are simple in the application layer - every request is served by a different thread (or process), so unless you have state in your processing classes (services), everything is safe.

Things get more complicated when you reach the database - i.e. where the state is held. There you need transactions to ensure that everything is ok.

Transactions have a set of properties - ACID, that "guarantee database transactions are processed reliably".

Upvotes: 0

Milhous

Reputation: 14653

If you do not have transactions in mysql, you can use the update command to ensure that the data is not corrupted.

UPDATE tableA  SET status=2  WHERE status = 1

If status is one, then only one process well get the result that a record was updated. In the code below, returns -1 if the update was NOT executed (if there were no rows to update).

PreparedStatement query;
query = connection.prepareStatement(s);
int rows = -1;
try
{
    rows = query.executeUpdate();
    query.close();
}
catch (Exception e)
{
   e.printStackTrace();
}
return rows;