vivek.m
vivek.m

Reputation: 3341

Concurrent task picking from database

Inside a database table, I keep a list of tasks. For simplicity, assume task is an integer and let there be 100 integers in increasing order in table. (Increasing order means if someone is working on task N, then all tasks < N are already being worked upon)

I also have 5 clients that connect and pick tasks from database and update the database when task is finished.

I do not want any two clients to pick the same task.

Whenever a task is picked, I add it to another table, 'tasksTable'. When picking a new task, I find max_int in 'tasksTable', and pick the task = max_int+1

To avoid work duplication, I serialize the process of picking tasks i.e.

getlock
read max_int
pick task
update tasksTable
releaselock

Since I had only about 10 workers, serializing was not much of issue. What if I have 1000's of clients. How can I parallelize the task picking?

Upvotes: 1

Views: 357

Answers (4)

usr
usr

Reputation: 171216

I think your question is mainly about increasing concurrency. You already have a safe algorithm. Now you want scale.

Some obvious solutions have the problem of serializing access to the table by locking the same portion of it. Be careful here and test for that.

Task queueing is usually implemented with READPAST. This allows a query to read past already locked rows, allowing to dequeue the next task while previous tasks are currently being marked for being in processing.

You can achieve exclusive access to a certain row by updating it, or by using lock hints like XLOCK, ROWLOCK, HOLDLOCK.

There is a set of articles by Rusanu Remus on queue tables that answers about everything. Feel free to ask follow-up question in the comments below this post.

Upvotes: 0

Tracy McKibben
Tracy McKibben

Reputation: 306

Assume two tables, one for unpicked tasks (WaitingTasks), one for picked "in-progress" tasks (WorkingTasks):

CREATE TABLE WaitingTasks (WaitingTaskID INT IDENTITY(1,1), ColA, ColB, ...)

and

CREATE TABLE WorkingTasks (WorkingTaskID INT IDENTITY(1,1), WaitingTaskID INT, ColA, ...)

Concurrent task picking can be accomplished like so:

INSERT INTO WorkingTasks (WaitingTaskID)
SELECT TOP 1 WaitingTaskID
FROM WaitingTasks
WHERE WaitingTaskID NOT IN (SELECT WaitingTaskID FROM WorkingTasks);

SELECT SCOPE_IDENTITY();

This will create a new "working task" with its own unique ID, SCOPE_IDENTITY() will return that unique ID to you.

Upvotes: 1

user2810910
user2810910

Reputation: 287

Not quite following what you are doing here. Are you locking a record in the database table until the task is complete? If so, I don't think it's a good idea. How about instead, adding a column to your task database table that provides the status of a task? If it’s checked out, set it to true, else set it to false. Alternately, you can add a column indicating who it’s checked out to (his personID). If it’s null, it’s not checked out to anyone. Note in this solution, you only have one task database table, not two (a separate one for completed tasks). I think thats' a better solution (better normalized).

Only display tasks to a user that haven't been assigned yet. If he picks one and someone beat him to signing up for that task before he has a chance to hit the update button, you can inform him upon the update that its too late. Example: the 'where' clause in this SQL does the trick: 'update tasks set personID=233421 where personID==null'. If the update comes back with zero records updated, you know what happened.

I don't think your database/program will have any problem supporting thousands of users since its only when more than one user goes to update the task table within the same few milliseconds is there a slight delay until the first update completes. Even then, for 1000 users, I doubt more than 10 or so will click update at the same time (a few milliseconds times 10 is still not much time).

Also, I think you should let the database generate a new task number rather than programmatically doing it with task = max_int+1.That way you don't have to lock the table to ensure it doesn't change until you update. On a side note: if you need to update more than one table per update, set up a transaction. You don't need to do it for one table update.

Last note: One general solution to handling updates is to add a 'version' column (integer or long) to your tables and increament it each time someone updates the record. Then for each update, see if the version number is the same as when it was last read (SQL: update tasks set personID=233421 where version=4). If not the same, inform the user.

Upvotes: 0

Jeroen van Langen
Jeroen van Langen

Reputation: 22073

You can try using lock, but you could solve it with an update/select:

BEGIN TRAN

UPDATE Task
SET clientId = MyClientId
WHERE [PrimaryKey] = (
   SELECT TOP 1 [PrimaryKey]
   FROM Tasks
   WHERE clientId IS NULL
   ORDER BY CreationDateTime)

SELECT * FROM Tasks
WHERE [PrimaryKey] = MyClientId

COMMIT TRAN

Something like this?


Or you could use the OUPUT Clause:

DECLARE @TaskId AS INT

UPDATE TOP(1) Task
SET clientId = MyClientId
OUTPUT Task.[PrimaryKey] INTO @TaskId
WHERE clientId IS NULL
ORDER BY CreationDateTime

SELECT @TaskId

(not tested it)

source: http://blog.sqlauthority.com/2007/10/01/sql-server-2005-output-clause-example-and-explanation-with-insert-update-delete/

Upvotes: 0

Related Questions