bigfoot
bigfoot

Reputation: 455

Covering index including rowversion? Good or bad

I'm working on synchronizing clients with data for eventual consistency. The server will publish a list of database ids and rowversion/timestamp. Client will then request data with incorrect version number. The primary reason for inconsistent data is networking issues between broker nodes, split brain, etc.

When I read data from my tables, I request data based on a predicate that is not the primary key. I iterate available regions to read data per region. This is my select:

SELECT DatabaseId, VersionTimestamp, OperationId
FROM TableX 
WHERE RegionId = 1

Since this leads to an index scan per query, I'm wondering if a non-clustered index on my RegionId column, and include the selected columns in that index:

CREATE NONCLUSTERED INDEX [ID_TableX_RegionId_Sync]
ON [dbo].[TableX] ([RegionId])
INCLUDE ([DatabaseId],[VersionTimestamp],[OperationId])

VersionTimestamp is rowversion/timestamp column, and will of course change whenever a row is updated, so I'm wondering if it is a poor design choice to include this column in an index since it will need to be updated at every insert/update/delete?

Since this will result in n index scans, rather than n index seeks, it might be better to read all the data once, and then group by regionId and fill in empty lists of rows where a regionId doesn't have any data.

The real life scenario is a bit more complicated, as there are table relationships that will also have to be queried. I haven not yet looked at including one to many relationships in my version queries.

This is primarily about better understanding the impact of covering indexes and figuring out how to better use them. Since I am going to read all the data from the table in any case, it is probably cheaper to load them all at once. However, reading them as from the query above, it makes my code a lot cleaner for this simple no-relationship example alone.

Edit: Alternative 2 Another option that came to mind, is creating a covering index on RegionId, and include my primary key (DatabaseId).

SELECT DatabaseId
FROM TableX WHERE RegionId=1

And then a new query where I select the needed columns WHERE DatabaseId IN(list, of, databaseId)

For the current scenario, there are only max thousands of rows in the table, and not in the millions. Network traffic for the two (x n) queries might most likely outweigh the benefits of using indexes, and be premature optimization.

Upvotes: 0

Views: 303

Answers (0)

Related Questions