Vetsin
Vetsin

Reputation: 2622

Hibernate SQLServer batch update transaction lock resulting in extremely poor performance?

I have a table filemapping with over 140 million rows, and then commit a batch update (on, say, a million rows) with spring data like follows:

jdbcTemplate.batchUpdate("UPDATE filemapping SET checksum=? WHERE filePath=?", new BatchPreparedStatementSetter() {
    public void setValues(PreparedStatement stmt, int issueIndex) throws SQLException {
        stmt.setString(1, batchObjects[issueIndex].getChecksum());
        stmt.setString(2, batchObjects[issueIndex].getFilePath());
    }

    public int getBatchSize() {
        return 1000;
    }
});

Which may look like:

CREATE TABLE [dbo].[filemapping] (
    [id]                INT            IDENTITY (1, 1) NOT NULL,
    [filePath]          VARCHAR (3000) NULL,
    [project_id]        INT            NOT NULL,
    [checksum]          VARCHAR (255)  NULL,
    CONSTRAINT [PK_FM] PRIMARY KEY NONCLUSTERED ([id] ASC),
    CONSTRAINT [ReFileMap] FOREIGN KEY ([project_id]) REFERENCES [dbo].[project] ([id]) ON DELETE CASCADE
);

CREATE NONCLUSTERED INDEX [MapIndexOne]
    ON [dbo].[filemapping]([project_id] ASC, [filePath] ASC);

CREATE NONCLUSTERED INDEX [MapIndexChecksum]
    ON [dbo].[filemapping]([checksum] ASC);

As this table has grown the execution time for this has gone up orders of magnitude -- a series of updates which used to take a minute now takes hours. The sp_WhoIsActive informs me that we are getting locks on the filemapping table which, which to my understanding may explain why server resources are under-utilized and the update operations are so slow.

My questions are:

Upvotes: 1

Views: 37

Answers (2)

mikkel
mikkel

Reputation: 518

Assuming the index [MapIndexOne] meant to say filePath and not fileName, you could leverage this index by including Project_Id in your statement:

jdbcTemplate.batchUpdate("UPDATE filemapping SET checksum=? WHERE 
   filePath=? AND Project_Id=?", new BatchPreparedStatementSetter() {
public void setValues(PreparedStatement stmt, int issueIndex) throws 
SQLException {
        stmt.setString(1, batchObjects[issueIndex].getChecksum());
        stmt.setString(2, batchObjects[issueIndex].getFilePath());
        stmt.setInt(3, batchObjects[issueIndex].GetProjectId());
    }
});

With this SQLServer can use [MapIndexOne] to quickly find the data records it needs to update.

And you can just kill [MapIndexChecksum], it serves no purpose - it actually slows down the updates slightly.

Upvotes: 1

Charlieface
Charlieface

Reputation: 72194

Foremost, what can be done to speed this up?

You need the correct index:

CREATE NONCLUSTERED INDEX [MapIndexFilePath]
    ON [dbo].[filemapping](filePath ASC) INCLUDE ([checksum] ASC);

This allows fast lookups based on filePath, with the column being modified as the INCLUDE.

You should also change your primary key to CLUSTERED, as I note it's currently NONCLUSTERED, so you currently have a heap table instead of a clustered index.

ALTER TABLE [dbo].[filemapping]
    DROP CONSTRAINT [PK_FM];
ALTER TABLE [dbo].[filemapping]
    ADD CONSTRAINT [PK_FM] PRIMARY KEY CLUSTERED ([id] ASC);

Alternatively, if the primary key is rarely queried and the filepath is more often queried, make the above MapIndexFilePath index CLUSTERED instead.

Are lower or higher batch sizes per transaction worth exploring?

Yes, larger batches can speed things up, but it can also cause an explosion in your LDF log file, so don't go too big. Also 5000 locks is the usual tip-over to table locking from page- or row-locking, so try to stay under that (or just force a TABLOCKX and make everyone wait so it'll go even faster).

Do the indexes matter, presuming it's the lock that's the limiting factor? How can I tell? (my statistics show CPU waits being the highest, and no normally only 1 cpu being used)

Yes, of course. If the server can't seek the index then it's going to have to scan the whole thing, taking a lot of time and locking for longer.

Why would a lock slow the updates down in the first place? Could it be something else?

You have to wait for someone else to finish what they are doing. Also even if there's no contention you still need 1m latches, which are fast but still take time.

Would Skipping locks matter at all for a series of updates with no selects?

If you mean NOLOCK then no, no, thrice no. You can't NOLOCK on an UPDATE anyway except in the read phase (it's just ignored), and it'll just make things worse. It also leaves you open to some very serious data integrity issues, so there should be almost no circumstances when you want to use it.
If you mean using READPAST, you could do that but obviously your UDPATE won't actually modify locked rows. If a table lock is in force then you might not update any rows at all. You may need some retry logic.


One further option: batch updating. You upload the batch to the server using a temp table or a Table Valued Parameter (if your driver supports it). Then you do a joined update. Make sure the temp table/TVP is indexed.

For example:

UPDATE fm
SET checksum = t.checksum
FROM filemapping fm
JOIN @temp t ON t.filePath = fm.filepath;

Upvotes: 1

Related Questions