Reputation: 973
There are varying versions of this question on stackoverflow already, but none of them helped me to get to the bottom of my issue. So, here I go again with more specific details of my problem.
We've been randomly getting Transaction (Process ID xx) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
. Let me be clear, this is not row or table level locking. I've tried enough guessed/random things; I need exact step by step guide on how to troubleshoot deadlock on communication buffer.
If you interested in specific details then read on.
Specific Details of Scenario : We have a very simple Dapper ORM based C# .net core Web API that takes in requests and performs CRUD operations to a database hosted on this Microsoft Sql server. To do this, Connection manager (registered as a scoped service) opens a new IDbConnection
connection in request scope; this connection is used to execute deletes, inserts, updates or get. For insert/update/delete C# line looks like this await connection.ExecuteAsync("<Create update or delete statement>", entity);
For GET
requests we simply run await connection.QueryFirstOrDefaultAsync<TEntity>("<select statement>", entity);
; There are 5 types of entity (all presenting simple non relational tables). They all CRUD by ID.
What has been tried so far
Server Specifications: We have Microsoft Sql Server 2016 On Azure hosted in virtual machine with 64 cores and 400GB RAM. Usual workload on this server is 10% CPU and 30% RAM, occasionally it goes up to 80% CPU and 350GB RAM. At all the times when this issue occurred, CPU usage was noticed under 20% (mostly around 10%, only one occasion 20%, RAM was under 30% on all occasions).
Deadlock XML Event as per @Dan Guzman's request
File size was too large for this post so created this google drive file. Please click on the following link then in top right corner click download. It is a zip file.
https://drive.google.com/file/d/1oZ4dT8Yrd2uW2oBqBy9XK_laq7ftGzFJ/view?usp=sharing
Upvotes: 3
Views: 8541
Reputation: 973
@DanGuzman helped so I had to upvote/choose his answer as accepted answer. But, I'd like to summarize what went here, what I learned and a step by step approach on how to troubleshoot deadlock on communication buffer (or any deadlock for that matter).
Step - 1
Pull the deadlock report. I used following query but you could also use the query @DanGuzman suggested (in comment section to this question).
SELECT
xed.value('@timestamp', 'datetime2(3)') as CreationDate,
xed.query('.') AS XEvent
FROM
(
SELECT CAST([target_data] AS XML) AS TargetData
FROM sys.dm_xe_session_targets AS st
INNER JOIN sys.dm_xe_sessions AS s
ON s.address = st.event_session_address
WHERE s.name = N'system_health'
AND st.target_name = N'ring_buffer'
) AS Data
CROSS APPLY TargetData.nodes('RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData (xed)
ORDER BY CreationDate DESC
Step - 2
Locate the deadlock event corresponding to your sql exception timing/data. Then read this report in conjunction with Detecting and Ending Deadlocks guide to understand the root cause of your deadlock issue. In my case I was getting deadlock on communication buffer so as per this guide the Memory (the Memory
section of Detecting and Ending Deadlocks guide) must have been causing the problem. As Dan pointed out, in my case, following query appeared in deadlock report which was using way too much buffer (as a result of inefficient query). So what is deadlock on communication buffer? Well, if this query requires too much buffer to finish its execution then two such queries could start their execution at the same time and start claiming the buffer they need but at some point available buffer might not be enough and they'll have to wait for more buffer freed up from completion of execution of other queries. So both query wait for each other to complete in a hope to get some more buffer freed up. this could lead to deadlock on buffer (as per the Memory section of guide)
<inputbuf>
@SomeStatus1 nvarchar(4000),@ProductName nvarchar(4000),@ProductNameSide nvarchar(4000),@BayNo nvarchar(4000),@CreatedDateTime datetime,@EffectiveDate datetime,@ForSaleFrom datetime,@ForSaleTo datetime,@SetupInfoNode nvarchar(4000),@LocationNumber nvarchar(4000),@AverageProductPrice decimal(3,2),@NetAverageCost decimal(3,1),@FocustProductType nvarchar(4000),@IsProduceCode nvarchar(4000),@ActivationIndicator nvarchar(4000),@ResourceType nvarchar(4000),@ProductIdentifierNumber nvarchar(4000),@SellingStatus nvarchar(4000),@SectionId nvarchar(4000),@SectionName nvarchar(4000),@SellPriceGroup nvarchar(4000),@ShelfCapacity decimal(1,0),@SellingPriceTaxExclu decimal(2,0),@SellingPriceTaxInclu decimal(2,0),@UnitToSell nvarchar(4000),@VendorNumber nvarchar(4000),@PastDate datetime,@PastPrice decimal(29,0))
UPDATE dbo.ProductPricingTable
SET SellingPriceTaxExclu = @SellingPriceTaxExclu, SellingPriceTaxInclu = @SellingPriceTaxInclu,
SellPriceGroup = @SellPriceGroup,
ActivationIndicator = @ActivationIndicator,
IsProduceCode = @IsProduceCode,
EffectiveDate = @EffectiveDate,
NetCos
</inputbuf>
Step 3 (The Fix)
Wait !!!! But I used Dapper. Then how come it could convert my query into such a deadly query? Well Dapper is great for most situation with out of box defaults, however, clearly, in my situation this default 4000 nvarchar killed it (please read Dan's answer for understanding how could such a query could cause problem). As Dan suggested, I had automatic parameter building from input entity like this await connection.ExecuteAsync("<Create update or delete statement>", entity);
, where entity
is an instance of C# model class. I changed it custom parameters as shown below. (for sake of simplicity I only added one parameter but you could use all required)
var parameters = new DynamicParameters();
parameters.Add("Reference", entity.Reference, DbType.AnsiString, size: 18 );
await connection.ExecuteAsync("<Create update or delete statement>", parameters );
I can see in profiler that requests are now having exact matching type column parameter type. That's it, this fix made the problem go away. Thanks Dan.
Conclusion
I could conclude that in my case deadlock on communication buffer occurred because of a bad query that took too much buffer to execute. This was the case because I blindly used default Dapper parameter builder. Using Dapper's custom parameter builder solved the problem.
Upvotes: 7
Reputation: 46251
Deadlocks are often a symptom that query and index tuning is needed. Below is an example query from the deadlock trace that suggests the root cause of the deadlocks:
<inputbuf>
@SomeStatus1 nvarchar(4000),@ProductName nvarchar(4000),@ProductNameSide nvarchar(4000),@BayNo nvarchar(4000),@CreatedDateTime datetime,@EffectiveDate datetime,@ForSaleFrom datetime,@ForSaleTo datetime,@SetupInfoNode nvarchar(4000),@LocationNumber nvarchar(4000),@AverageProductPrice decimal(3,2),@NetAverageCost decimal(3,1),@FocustProductType nvarchar(4000),@IsProduceCode nvarchar(4000),@ActivationIndicator nvarchar(4000),@ResourceType nvarchar(4000),@ProductIdentifierNumber nvarchar(4000),@SellingStatus nvarchar(4000),@SectionId nvarchar(4000),@SectionName nvarchar(4000),@SellPriceGroup nvarchar(4000),@ShelfCapacity decimal(1,0),@SellingPriceTaxExclu decimal(2,0),@SellingPriceTaxInclu decimal(2,0),@UnitToSell nvarchar(4000),@VendorNumber nvarchar(4000),@PastDate datetime,@PastPrice decimal(29,0))
UPDATE dbo.ProductPricingTable
SET SellingPriceTaxExclu = @SellingPriceTaxExclu, SellingPriceTaxInclu = @SellingPriceTaxInclu,
SellPriceGroup = @SellPriceGroup,
ActivationIndicator = @ActivationIndicator,
IsProduceCode = @IsProduceCode,
EffectiveDate = @EffectiveDate,
NetCos
</inputbuf>
Although the SQL statement text is truncated, it does show that all parameter declarations are nvarchar(4000)
(a common problem with ORMs). This may prevent indexes from being used efficiently when column types referenced in join/where clauses are different, resulting in full scans that lead to deadlocks during concurrent queries.
Change the parameter types to match that of the referenced columns and check the execution plan for efficiency.
Upvotes: 2