anemaria20
anemaria20

Reputation: 1728

How to 'lock' database rows being processed

I have a database filled with rows and multiple threads that are accessing these rows, inputting some of the data from them in a function, producing an output, and then filling the row's missing columns with the output.

Here's the issue: Each row has an unprocessed flag which is, by default, true. So each thread is looking for rows with this flag. But each thread is getting the SAME row, it turns out...because the row is being marked as processed after the thread's job is complete, which may happen after a few seconds.

One way I avoided this was to insert a currently_processed flag for each row, mark it as false, and once a thread accesses the row, change it to true. Then when the thread is done, just change if back to false. The problem with this is that I have to use some sort of locking and not allow any other thread to do anything until this occurs. I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.

If it helps, the code is in Ruby, but this problem is language agnostic, but here's the code to demonstrate the type of threading I'm using. So nothing special, threading on the lowest level like almost all languages have:

3.times do
  Thread.new do
   row = get_database_row
   result = do_some_processing(row)
   insert_results_into_row(result)
  end
end.each(&:join)

Upvotes: 1

Views: 230

Answers (2)

Alexander Anikin
Alexander Anikin

Reputation: 1098

I was wondering if there's an alternative approach where I wouldn't have to do thread locking (via a mutex or something) and thus slow down the whole process.

There are some ways to do this:

1) One common dispatcher for all threads. It should read all rows and put them into shared queue from where processing theads will get rows.

2) Go deeper into DB, find out if it supports something like oracles's "select for update skip locking" syntax and utilize it. For oracle you need to use his syntax in cursor and make somewhat cumbersome interaction, but at least it can work this way.

3) Partition input by, say, index of worker thread. So 1st worker out of 3 will only process rows 1,4,7 etc. 2nd worker will only process rows 2, 5, 8 etc.

Upvotes: 0

GhostCat
GhostCat

Reputation: 140467

The "real" answer here is that you need a database transaction. When one thread gets that row, then the database needs to know that this row is currently up for processing.

You can't resolve that within your application! You see, when two threads look at the same row at the same time, they could both try to write that flag ... and yep, it for sure changes to "currently processed"; and then both threads will update row data and write that back. Maybe that is not the problem if any processing results in the same final result; but if not, then all kinds of data integrity problems will arise.

So the real answer is that you step back and look how your specific database is designed in order to deal with such things.

Upvotes: 2

Related Questions