Reputation: 71

MS SQL Server - Frequent Deadlock Issues

Target Database : MS SQL Server 2012

Requirement : Maintain summary of user’s workload in a table. Following is the table structure used to maintain this summary. Table is getting updated on modification of status of workitem assigned to user.

PARTICIPANT_ID varchar NO PK, Unique non clustered index
OFFERED_COUNT numeric YES
ALLOCATED_COUNT numeric YES
STARTED_COUNT numeric YES
SUSPENDED_COUNT numeric YES

Issue : We are facing frequent deadlocks on this table. Herewith I have attached deadlock graph for reference.

Investigation done so far :

Target Rows (Updates) in deadlocks are independent to each other so no question of one transaction waiting for other to relinquish lock.Even inserts to the table becomes part of deadlock.
Same transaction is working with similar concurrent load in Oracle.
Suspecting that table is getting locked instead of particular row, we even tried to disable escalation of locks at a table using following command but deadlocks persisted.

ALTER TABLE Table_name SET (LOCK_ESCALATION DISABLE)

Deadlock graph

<deadlock>
 <victim-list>
  <victimProcess id="process133f03d498" />
 </victim-list>
 <process-list>
  <process id="process133f03d498" taskpriority="0" logused="3308" waitresource="RID: 11:1:819250:71" waittime="2377" ownerId="29578701" transactionname="implicit_transaction" lasttranstarted="2017-11-09T09:29:43.397" XDES="0x10b0b716a8" lockMode="U" schedulerid="6" kpid="7332" status="suspended" spid="210" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2017-11-09T09:29:45.977" lastbatchcompleted="2017-11-09T09:29:45.397" lastattention="1900-01-01T00:00:00.397" clientapp="jTDS" hostname="PAWSPPAS02" hostpid="123" loginname="STG_SUITE_SSO" isolationlevel="read committed (2)" xactid="29578701" currentdb="11" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="40" sqlhandle="0x02000000ee409c1fd278abb7f476707399656cbedec9d1960000000000000000000000000000000000000000">
update [RS_WI_PARTICIPANT_SUMM] set [ALLOCATED_COUNT]=[ALLOCATED_COUNT]-1 where [PARTICIPANT_ID]= @P0    </frame>
   </executionStack>
   <inputbuf>
(@P0 nvarchar(4000))update [RS_WI_PARTICIPANT_SUMM] set [ALLOCATED_COUNT]=[ALLOCATED_COUNT]-1 where [PARTICIPANT_ID]= @P0    </inputbuf>
  </process>
  <process id="processc70140188" taskpriority="0" logused="85392" waitresource="RID: 11:1:819250:30" waittime="2196" ownerId="29574187" transactionname="implicit_transaction" lasttranstarted="2017-11-09T09:29:16.427" XDES="0x1150f5d6a8" lockMode="U" schedulerid="4" kpid="384" status="suspended" spid="141" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2017-11-09T09:29:46.157" lastbatchcompleted="2017-11-09T09:29:45.470" lastattention="1900-01-01T00:00:00.470" clientapp="jTDS" hostname="PAWSPPAS02" hostpid="123" loginname="STG_SUITE_SSO" isolationlevel="read committed (2)" xactid="29574187" currentdb="11" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="40" sqlhandle="0x020000001c3c261d27cfa98e13a9cb06f6c30e40cfcaa4a50000000000000000000000000000000000000000">
update [RS_WI_PARTICIPANT_SUMM] set [ALLOCATED_COUNT]=[ALLOCATED_COUNT]+1 where [PARTICIPANT_ID]= @P0    </frame>
   </executionStack>
   <inputbuf>
(@P0 nvarchar(4000))update [RS_WI_PARTICIPANT_SUMM] set [ALLOCATED_COUNT]=[ALLOCATED_COUNT]+1 where [PARTICIPANT_ID]= @P0    </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <ridlock fileid="1" pageid="819250" dbid="11" objectname="STG_SUITE_SSO.dbo.RS_WI_PARTICIPANT_SUMM" id="lock1236714900" mode="X" associatedObjectId="72057599198953472">
   <owner-list>
    <owner id="processc70140188" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process133f03d498" mode="U" requestType="wait" />
   </waiter-list>
  </ridlock>
  <ridlock fileid="1" pageid="819250" dbid="11" objectname="STG_SUITE_SSO.dbo.RS_WI_PARTICIPANT_SUMM" id="lockdacb14f00" mode="X" associatedObjectId="72057599198953472">
   <owner-list>
    <owner id="process133f03d498" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="processc70140188" mode="U" requestType="wait" />
   </waiter-list>
  </ridlock>
 </resource-list>
</deadlock>

Upvotes: 3

Answers (1)

Remus Rusanu

Reputation: 294197

One thread has updated the row 819250:71 and now wants to update the row 819250:30, the other has updated the row 819250:30 and now wants to update the row 819250:71.

I can only speculate about what is going on here (since you are omitting so many details), but my guess is that the application has some logic that is updating two 'participants' in some sort of exchange and, inside a database transaction, is trying to increase one's amount +1 and then decrease the other one's amount -1. However, the application logic is doing it consistently in opposite order (on thread is doing in a transaction [A+=1, B-=1] while the other is doing it [B+=1, A-=1]). No schema design can prevent this deadlock, only application logic change. Look into why concurrent opposite updates are so frequent.

This does not seem to be a issue of using a non-unique non-clustered index on [PARTICIPANT_ID], because in that case the engine would locate the rows 819250:71 and 819250:30 in the same order for both statements and would not deadlock. Hence my conclusion that we have (at least) two update statements in a single transaction that updates two different rows (two @P0 values) and the app logic is updating the same set of participants in opposite order from 2 transactions (2 requests perhaps).

Furthermore:

<process id="498" "RID: 11:1:819250:71" lasttranstarted="2017-11-09T09:29:43.397" lastbatchstarted="2017-11-09T09:29:45.977" lastbatchcompleted="2017-11-09T09:29:45.397"... >
<process id="188" "RID: 11:1:819250:30" lasttranstarted="2017-11-09T09:29:16.427" lastbatchstarted="2017-11-09T09:29:46.157" lastbatchcompleted="2017-11-09T09:29:45.470" ...>

If you look at the two transactions:

process 1 started the transaction at 9:29:43.397, completed a batch at 9:29:45.397 and started the current batch at 9:29:45.977
process 2 started the transaction at 9:29:16.427, completed a batch at 9:29:45.470 and started the current batch at 9:29:46.157

This time sequence corroborates my hypothesis. The transactions contain a sequence of batches, indicating individual updates (eg. Entity Framework apply changes or similar), which apply to different rows (different IDs). The IDs are overlapping and the updates are applied in inconsistent order (again, typical behavior of an ORM like EF applying changes).

If my speculation is right, then you should look at locking a higher level logical entity first. If these are participants, there must be something they participate into, like a game. Before updating the participants, lock the game: eg. using sp_getapplock on the [GAME_ID] at the start of the transaction. This way the order of updating the participants is no longer dangerous for deadlocks, because the updates are serialized for each game.

Other solution is to sort the participants before applying the updates so that every transaction updates the rows in the same order.

We do have only one non clustered index on [PARTICIPANT_ID] column

Almost certain this should be a clustered index. There are very few cases when having a non-clustered index and a heap make sense.

Upvotes: 4

MS SQL Server - Frequent Deadlock Issues

Answers (1)

Related Questions