Reputation: 1711
I have an java application where 15 threads select a row from table with 11,000 records, through a synchronized method called getNext(), the threads are getting slow at selection a row, thereby taking a huge amount of time. Each of the thread follows the following process:
Thread checks if a row with resume column value set to 1 exist.
A. If it exist the thread takes the id of that row and uses that id to select another row with id greater than that of the taking id.
B. Otherwise it select's a row with id greater than 0.
The last row received based on the outcome of steps described in 1 above is marked with the resume column set to 1.
The threads takes the row data and works on it.
Question:
Upvotes: 0
Views: 1441
Reputation: 2843
1.: It seems the multiple database operations in getNext() art the bottleneck. If the data isn't change by an outside source you could read "id" and "resume" of all rows and cache it. Than you would only have one query and than operate just in memory for reads. This would safe lot of expensive DB calls in getNext():
2.: Basically you need some sort of transactions or at least add an other column that gets updated when a thread has finished processing that row. Basically the processing and the update need to happen in a single transaction. When something happens while the transaction is not finished, you can rollback to the state in which the row wasn't processed.
Upvotes: 1
Reputation: 3058
Well, would think of different issues here:
Are you keeping status in your DB? I would look for some approach where you call a select for update where you filter by inactive status (be sure just to get one row in the select) and immediately update to active (in same transaction). It would be nice to know what DB you're using, not sure if "select for update" is always an option.
Process and when you're finished, update to finished status.
Be sure to keep a timestamp in the table to identifiy when you changed status for the last time. Make yourself a rule to decide when an active thread will be treated as lost.
Define other possible error scenarios (what happens if the process fails).
You would also need to analyze the scenario. How many rows does your table have? How many threads call it concurrently? How many inserts occur in a given time? Depending on this you will have to see how DB performance is running.
I'm assuming you'r getNext() is synchronized, with what I wrote on point 1 you might get around this...
Upvotes: 1
Reputation: 4184
If the threads are all on the same machine they could use a shared data structure to avoid working on the same thing instead of synchronization. But the following assumes the threads are on on different machines ( maybe different members of an application server cluster ) and can only communicate via the database.
Remove synchronization on getNext() method. When setting the resume flag to 1 (step 2), do so atomically. update table set resume=1 where resume = 0, commit. Only one thread will succeed at this, the thread that does gets that unit of work. At the same time, set a resume time-- if the resume time is greater than some max assume the thread working on that unit of work hash crashed, set resume flag back to 0. After the work is finished set the resume time to null, or otherwise mark the work as done.
Upvotes: 1