Reputation: 2557
this question is about optimization. The scenario in question is as follows:
I have a table with data that should be read and acted upon by multiple transactions. The table consists of three columns: {id, message, status}.
ID MESSAGE STATUS
1 First Message NEW
2 Second Message MOD
3 Third Message FIN
....
As shown, the status is either one of these three states: {new, mod, fin}. They indicate if a row is new, is being processed at this moment or has been processed successfully. I do this so that multiple transactions work on this table. A transaction reads all "NEW" Messages and change the status to "MOD", to indicate that they are being processed and to lock them. After that they are processed. Successfully processed rows will be set to "FIN" and failed ones to "NEW" so that the next transaction will try those again.
Other than the changing of the status, the table is readonly. The accual information that is to be processed is never altered.
The process described above works, but I think its unnecessarily complicated. What I look for here is a simpler solution. I've already read about different isolation levels but they all basically cause the rows to be lock in a way so that other transactions have to wait. I dont want that. Instead I would like those already read rows to be ignored by the other queries.
Now finally, on to the question: Is it possible to lock rows on read, so that they will be ignored by other transations, instead of the transactions waiting for those rows to be unlocked?
I want something like this: Lets say the table holds 30 rows.
Is there a way to do this without the need of a status column?
Upvotes: 3
Views: 1317
Reputation: 13046
Is there a way to do this without the need of a status column?
In short, no, but not due to locking/parallelism concerns.
In reality, you need that status column simply so that, when your processing system eventually goes down, you know what rows you need to process again when it comes back up. That it actually makes it easier to do work in parallel is a side benefit.
As for the rest of your question:
I've already read about different isolation levels but they all basically cause the rows to be lock in a way so that other transactions have to wait. I dont want that. Instead I would like those already read rows to be ignored by the other queries.
...well, you're in luck, cause that's not what you want to happen anyways. Your primary problem is that in order to lock rows with only a read, you not only have to prevent other processes from reading those rows (since read-only locks are normally shared), you have to prevent rows from being inserted as well. If instead, however, you lock a row with a write, it becomes vastly simpler and easier to interleave processes.
And all you need is Cursor Stability
:
However, if any data in the row was changed, the lock is held until the change is committed.
Under this isolation level, no other application can update or delete a row while an updatable cursor is positioned on that row. Under CS, access to the uncommitted data of other applications is not possible. However, non-repeatable reads and phantom reads are possible.
...allowing other processors to update similar data is what non-repeatable reads are about. We actually want them to occur! We don't want phantom reads (which is what would happen if two rows read the same data and then tried to update it), but we can design our application around that.
One way to do this is to start your unit of work (this might be implicit or explicit, depending on the platform), and then issue an update statement:
UPDATE Transactions SET status = 'MOD'
WHERE status = 'NEW'
Then just retrieve your updated rows so you know what to process:
SELECT id, message
FROM Transactions
WHERE status = 'MOD'
Mark them as finished:
UPDATE Transaction SET status = 'FIN'
WHERE status = 'MOD'
And finally commit the changes:
COMMIT
(note that this version has a slight weakness - if the system goes down after updating to 'FIN'
but before the commit, the row will be reset to 'NEW'
, which might be its own problem. But at least 'MOD'
would be rolled back correctly)
Upvotes: 2