progNewbie
progNewbie

Reputation: 4822

How to handle race conditions in Web Service?

I implemented a Web Service with Java Servlets.

I got the following setup: There is a database which handles 'job'-entries. Each job has a status like 'executing' or 'in queue' or 'finished'. If a user starts a new job, there is made an entry in the database with a job and the status 'in queue'.

The job should only be executed if less than five other jobs are already executed. If there are five others already executing the status needs to stay 'in queue' and a Cronjob will handle the execution of this job later.

Now I just wonder, that if there are less than five jobs executing at the moment, my Script will execute this job. But what if at the same time, between my script asking the database how many jobs are being executed and the script starting to execute the job, another request from another user creates a job and also gets 'four executing jobs' as a result from the database.

Then there would be a race condition and 6 jobs would be executed.

How can I prevent something like that? Any advice? Thank you very very much!

Upvotes: 9

Views: 2847

Answers (7)

Alexander Petrov
Alexander Petrov

Reputation: 9492

In my opinion even if you don't use ExecutorService, it will be easiest to achieve your logic if you always update the database and start your jobs from a Single thread. You can arrange the execution of your Jobs in a Queue and have one thread to execute the and update the database status to the correct form.

If you want to control the number of Jobs executing. One way to do this is to use ExecutorsService with FixedThreadPool of 5. This way you will know for sure that only 5 jobs will be executing at once and no more... All other jobs will be queued in within the ExecutorService.

Some of my colleagues will point you to low level concurrency APIs. I believe that these are not meant for fixing general programming issues. Whatever you decide to do Try to use a higher level API and don't dig in into the details. Most of the low level stuff is already implemented within the existing frameworks and I doubt you will do it better.

Upvotes: 0

alros
alros

Reputation: 146

The answer is implicit in your question: your requests have to be enqueued so build a fifo queue with producers and consumers.

The servlet always adds jobs in the queue (optionally check if it's full), and 5 other threads will extract one job a time or sleep if the queue is empty.

There's no need to use cron or mutex for this, just remember to synchronize the queue or the consumers may extract the same job twice.

Upvotes: 1

Milton Hernandez
Milton Hernandez

Reputation: 664

As other people have responded, this situation calls for a Semaphore or Mutex. The one area where I think you may want to be careful is, where does the authoritative Mutex lives. Depending on the situation, you could have several different optimal solutions (trading-off security versus performance/complexity):

a) If you will have only one Server (non-clustered), and the only use case for modifying the Database is through your Servlet, then you could implement a static in-memory Mutex (some common object that you can synchronize access against). This will have the least impact in performance, and would be the easiest to maintain (because all the relevant code is in your project). Also, it doesn't depend on the idiosyncrasies of the specific Database you are using. It also allows you to lock access to non-database objects.

b) If you will have several separate Servers, but they are all instances of of your code, you could implement a Synchronization Service, that allows the specific instance to obtain the lock (probably with a timeout), before it is allowed to update the Database. This will be a bit more complex, but still all the logic will reside in your code, and the solution will be portable across database types.

c) If your database can be either updated by your server or by a different back-end process (for example an ETL), then the only way is to implement record level locking in the DB. If you do this, you will be dependent on the specific type of support your database provides and will probably require changes if you happen to port to a different DB. In my opinion, this is the most-complex, least maintainable option, and it should only be taken if the conditions for c) are unambiguously true.

Upvotes: 1

harmonica141
harmonica141

Reputation: 1469

Guy Grin is right, what you are calling for is a mutual exclusion situation that can be solved with semaphores. This construct by Dijkstra should solve your problem.

This construct is usually intended for usage with code, that can only be executed by only one process at a time. Example situations are exactly what you seem to be facing; e.g. database transactions that need to make sure you do not run into lost updates or dirty reads. Why exactly is it that you want 5 simultaneous executions? Are you sure you do not run into exactly those problems when you allow simultaneous execution at all?

The basic idea is to have a so called critical section in your code that has to be protected from race conditions resp. needs mutual exclusion handling. This part of your code is marked critical and before its execution tells other parties that also want to call it to wait(). As soon as it is done doing its magic it calls notify() and an internal handler now allows the next process in line to execute the critical section.

But:

  • I highly recommend not to implement ANY mutual exclusion handling approach by yourself. In a theoretical computer science class some years ago we analyzed these constructions on OS level and proved what can go wrong. Even if it looks simple at a first glance there is more to it than meets the eye and depending on the language it is really hard to get it right if you do it yourself. Especially in Java and related languages where you have no control over what the underlying VM is doing. Instead there are preimplemented out-of-the-box solutions that are already tested and proven correct.

  • Before handling mutual exclusion in a productive environment read a bit about it and be sure to understand what it implies. E.g. there is The Little Book of Semaphores which is a well written and nice to read reference. At least have a glance at it.

I am not quite sure about Java Servlets but Java does have an out-of-the-box solution for mutual exclusions in a keyword called synchronized to mark critical sections in your code that are not allowed to be executed simultaneously by several processes. There will be no need for external libraries.

A nice sample code is provided in this earlier post on SO. Although it is already stated there let me remind you to really use notifyAll() if you handle several producers / consumers otherwise weird things will happen and wild processes spinning in starvation will come and kill your cat.

Another bigger tutorial on the topic can be found here.

Upvotes: 1

Guy Grin
Guy Grin

Reputation: 2034

If I understand correctly and you have control over the application layer that makes the requests to the DB you could use Semaphores to control who is accessing the DB.

Semaphores, in a way, are like traffic lights. They give access to the critical code for only N threads. So, you could set N to 5, and allow only the threads in the critical code change their status to executing etc..

Here is a nice tutorial about using them.

Upvotes: 7

Tantowi Mustofa
Tantowi Mustofa

Reputation: 687

You can use record locking to control concurrency. One way to do it is by executing "select for update" query.

Your application must have other table that store worker_count. And then your servlet must do as following:

  1. Get the database connection

  2. Turn off auto commit

  3. Insert the job with 'IN QUEUE' status

  4. Execute "select worker_cnt from ... for update" query.

(at this point other users that execute the same query will have to wait until we commit)

  1. Read worker_cnt value

  2. If worker_cnt >= 5 commit and quit.

(at this point you get the ticket to execute the job, but other users still waiting)

  1. Update the job to 'EXECUTING'

  2. Increment worker_cnt

  3. commit.

(at this point other users can continue their query and will get updated worker_cnt)

  1. do execute the job

  2. Update the job to 'FINISHED'

  3. Decrement worker_cnt

  4. commit again

  5. close the database connection

Upvotes: 4

davidxxx
davidxxx

Reputation: 131346

Edit : I understand your question now. I do another response :)

Yes, you could have race conditions. You could use a database lock to handle them. If the record is not often accessed in a concurrent way, look at the pessimistic lock . If the record is often accessed in a concurrent way, look at the the optimistic lock.

Upvotes: 1

Related Questions