Clustered Index on date

Question

I'm redesigning a database for a project, since the current version does not match the reality of it's usage anymore and is causing serious problems.

Several of the main tables consist of this structure:

Object Identifier (FK, 4 byte int)
Datetime of record (can be duplicate for different objects)
Value (float)

From other questions (i.e. Composite Clustered Index in SQL Server) i gather that clustered indexes shouldn't be used on composite keys, and preferably smaller datatypes.

The data in this table (several millions records and growing by 10k a day) only ever gets inserted or queried based on the datetime field (get min/max value or all values for object in date range), updates and deletes almost never happen and only outside hours where querying happens. The order of inserts is usually with increasing datetimes, but slight differences are observed and sometimes larger backlogs need to be inserted. The order of a potential ID field would correctly match the datetime growth of a single object, but won't in general for all records.

My question is: I see several different ways to do Keys/Indexes on this kind of table, but what approach for the keys would work best in these cases?

Gottfried Lesigang · Accepted Answer

In your case it will not work to create a one-column clustered index on your DATETIME column. Values won't be unique...

To avoid fragmentation, it is very important, that the clustered index is implicitly sorted. Easiest is an IDENTITY column...

When you create a clustered index the clustered index is the table. It is - literally - the physically stored amount of data. This index includes all other columns implicitly. When you create another index (non-clustered of course), the existing clustered index will serve as lookup table.

With several million records you stumble into I/O issues. Many pages must be loaded to get all values together. Starting with SQL Server 2012 MS introduced column store indexes, which combine the advantages of a relational DBMS with the speed of column oriented storage systems. This index is stored in one block which can speed up queries against mass data enormously! The price you pay is disc space and slower manipulations.

You might use a 2-cloumn clustered index where you combine the date (in the first place) with an IDENTITY column. This will sort your table physically in the date's order. This should work pretty fast too.

What is best for you depends on many details:

Clustered Index on date

Answers (1)

Related Questions