Avyncentia
Avyncentia

Reputation: 49

Why do I get a ROWGROUP_FLUSH deadlock when parallel inserting into non-partitioned CCI using TABLOCK?

I have been getting persistent ROWGROUP_FLUSH deadlocks lately when inserting data into a CCI table (non-partitioned, no other indexes). Database compatibility level is 150 and uses simple recovery mode. I am inserting about 10 million rows in the transaction and the cardinality estimate looks pretty accurate:

details from actual execution plan

SSMS can't render ROWGROUP_FLUSH deadlock XMLs, but here's the representation from SentryOne: Diagram of deadlock showing parallel threads deadlocking on themselves

Here's a simplified version of the table structures and insert:

CREATE TABLE dbo.AllData (
    Metric INT NOT NULL 
    ,StartDate DATE NOT NULL 
    ,Value NUMERIC(18,0)
)
CREATE CLUSTERED COLUMNSTORE INDEX cci_AllData ON dbo.AllData

CREATE TABLE dbo.Data_MostRecent (
    Metric INT NOT NULL 
    ,StartDate DATE NOT NULL 
    ,Value NUMERIC(18,0)
)
CREATE CLUSTERED COLUMNSTORE INDEX cci_Data_MostRecent ON dbo.Data_MostRecent

--update the most recent table with the latest data
BEGIN TRANSACTION
    TRUNCATE TABLE dbo.Data_MostRecent
    INSERT INTO dbo.Data_MostRecent WITH (TABLOCK)
        (
            Metric
            ,StartDate 
            ,Value
        )
        SELECT 
            AllData.Metric 
            ,AllData.StartDate
            ,AllData.Value
        FROM dbo.AllData

        --get the last date per metric
        INNER JOIN (
            SELECT 
                Metric
                ,StartDate = MAX(StartDate)
            FROM dbo.AllData
            GROUP BY 
                Metric
        ) LastRun
            ON AllData.Metric = LastRun.Metric 
            AND AllData.StartDate = LastRun.StartDate
COMMIT TRANSACTION

I discovered that if I remove the TABLOCK hint I can still get parallel inserts (contrary to this article) and the deadlock does not appear to occur.

2 questions:

  1. Why does this work?
  2. Could removing the TABLOCK hint cause other problems? I'm guessing it's just there to allow the parallel insert, but that no longer seems to be necessary.

Thanks!

Upvotes: 0

Views: 248

Answers (1)

Ronen Ariely
Ronen Ariely

Reputation: 2434

(1) contrary to this article... Why does this work?

probably, since this is NOT what the article says.

"Must specify TABLOCK" is a limitation for SQL Server 2016 as explicitly mentioned in the article. I am not sure what is your full scenario in order to discuss the behavior in your case using SQL Server 2019. If you will provide full sample which we can execute (including the DDL+DML and sample data that you want to insert) then we might have better understanding.

enter image description here

(2) Could removing the TABLOCK hint cause other problems? I'm guessing it's just there to allow the parallel insert, but that no longer seems to be necessary.

TABLOCK has other impact on the behavior but using SQL Server 2019 latest version it is not needed for this reason probably (for parallel insert on CCI for example using bulk loading).

Note! This is an old article which you should not follow as it is. Notice that when you go the page then you get alert which explicitly inform you that using this document is not recommended

Check this document for more information:

https://learn.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-data-loading-guidance?view=sql-server-ver15

Notice under bulk loading title, this sentence:

Unlike rowstore bulk loads into SQL Server, you don't need to specify TABLOCK

Also check the limitation under the title of use a staging table...

One key limitation was that this INSERT operation was single threaded. To load data in parallel, you could create multiple staging table or issue INSERT/SELECT with non-overlapping ranges of rows from the staging table.

enter image description here

Upvotes: 1

Related Questions