sparsh610
sparsh610

Reputation: 1601

Multi-threading : Multiple threads interacting with same table

Interview question

Say , we have a table with 2 million records in Employee table and we need to cut 10% salary(need to do some processing) of each employee and then save it back to collection. How can you do it efficiently.

i asked him we can use executor framework for the same to create multiple threads which can fetch values from table then we can process and save it to list.

then he asked me how will you check that a record is already processed or not, there i was clueless(how to do that).

even i am not sure whether i am good with my approach or not.

please help.

Upvotes: 1

Views: 2038

Answers (3)

Bohemian
Bohemian

Reputation: 425073

The best approach given the question as stated is to use pure SQL, something like:

update employees set
salary = salary * .9

It is very hard to imagine needing to do something to employee data that SQL could not handle.

If by some quirk of bad design you really needed to do something to employee type data that SQL absolutely could not do, then you would open a cursor to the rowset and iterate through it, making the update synchronously so you only do one pass over the data.

In pseudo code:

cursor = forUpdate ("select for update * from employees")
while (cursor.next()) {
    cursor.salary = cursor.salary * .9
}

This is the simplest and likely fastest executing approach.

—-

Regarding logging

It’s only 2M rows, which is a “small” quantity, so most DB could handle it in a single transaction. However if not, add a where clause, eg where id between <start> and <end> to the query to chunk up the process into loggable amounts if using the shell script approach.

If using the code approach, most databases allow you to commit while holding the cursor open, so just commit every 10K rows or so.

Regarding locking

Similar aspects to logging. All rows in such a query are locked for the duration of the transaction. Given it would take that long to run, pick a quiet time to run. If it’s really a big deal, chunk up but realise that locking is unavoidable.

Upvotes: 1

John Kane
John Kane

Reputation: 4443

One thing that you could do is to use a producer/consumer type model, where you have one thread working to feed the others the records to update. This way you would not have to worry as much about duplicate processing.

Upvotes: 1

Emmett Deen
Emmett Deen

Reputation: 731

I would load in this table, then add a column for the state. By default, you could set this column to "Not Processed". Once a thread starts processing this employee it would change the state to "Processing", then when finished it would finally switch it to "Processed".

Having 3 states like this would also allow you to use this as a Lock preventing the processing from happening twice.

Upvotes: 0

Related Questions