T. Shaffner
T. Shaffner

Reputation: 419

Table Partitioning to avoid locks

TLDR; I'm trying to partition a table such that each partition can be operated on without locking the others, but it's not working.

Background: I'm creating a data pipeline process that exports data from a central corporate database to a SQL Server DB, where it can be used for analytics. I need to export the full table each day (currently deleting the old one, archiving comes later) and it can be almost 10GB.

To make it fast enough to be usable, I break the table up into 15 chunks based on a "plan" column. This is done in the data pipeline so each of the 15 possible values in the "plan" column is treated independently; each plan chunk is pulled from the corporate DB, after which old data for that chunk is deleted in the SQL Server table and the new data is uploaded.

The problem is, doing this in 15 chunks means all the processes block one another. Everything has to wait on everything else, making it all go VERY slow in the end, and often fail due to timeouts.

To address this I've:

  1. Created 16 SQL Server filegroups for the table in
  2. Added a file to each filegroup
  3. Partitioned the table on the plan column (the boundary for each filegroup is the name of one plan)

According to the answer here this should allow me to "drop or add lots of data in one gulp".

This doesn't seem to be helping though; I still get blocks.

So two questions:

  1. Is it possible to partition such that each partition can be dealt with without blocking the others? If so, how?
  2. Do I have to rerun the partition job each day? The various tutorials I've found about partitioning seem to sometimes imply so; it's unclear to me if the DB maintains the partition automatically or not.

Alternative I've also considered breaking the destination table up into 15 tables, and trying to recombine them with a view. Would that be better? Worse?

Upvotes: 1

Views: 3939

Answers (1)

Martin Smith
Martin Smith

Reputation: 453897

I've encountered similar issues in the past where concurrent workers are processing rows in a single physical partition each and different partitions but still blocking each other.

One reason this can happen is if one of the workers manages to escalate their lock to table level. You would need to examine the waiting tasks during a period of blocking to see if this applies to you.

SQL Server does support lock escalation to partition level but this is not the default (as can cause greater probability of deadlocks for some scenarios).

To enable this option you need to set LOCK_ESCALATION = AUTO.

If the work you are doing is replacing all the rows in the partition then you should consider Charlieface's comment and doing this in a new empty table per partition and using TRUNCATE TABLE ... WITH (PARTITIONS ()) or ALTER TABLE ... SWITCH to clear the old data and ALTER TABLE ... SWITCH to bring in the new data (both should generally be very quick operations). This would mean the inserts are truly independent, and this pattern may well be more beneficial w.r.t. minimal logging for the insert.

Upvotes: 3

Related Questions