Clustered Index improvemens for particular queries

Question

I found a table that looks like this:

 CREATE TABLE [dbo].Table1 (  
  id INT primary key IDENTITY (1, 1),  
  [idUser] INT  NOT NULL ,  
  [Amount] INT NOT NULL ,  
  [Attempts] INT NOT NULL ,  
  [date] [datetime] NOT NULL ,  
  [SUM_Amount] INT NOT NULL   
 ) ON [PRIMARY]

This table is created and populated with aggregated data for a particular period by a job.

Particularities:

This table will hold up to a million rows
idUser is unique
sum_Amount is running total of previous rows of amount.

This table will last as is, no update or delete or insert operation. Just this type of queries:

 select top (@n) * from table1
 order by [SUM_Amount] desc, [Attempts] desc

 select top (@n) * from table1
 where [SUM_Amount] >=@m order by [SUM_Amount] asc

I think it will improve preformance to change to a clustered index like this:

 CREATE TABLE [dbo].Table2 (  
  id INT IDENTITY (1, 1),  
  [idUser] INT  NOT NULL ,  
  [Amount] INT NOT NULL ,  
  [Attempts] INT NOT NULL ,  
  [date] [datetime] NOT NULL ,  
  [SUM_Amount] INT NOT NULL  

 CONSTRAINT [PK_Nueva]
   PRIMARY KEY CLUSTERED ([SUM_Amount] desc, [Attempts] desc, id asc)   

 ) ON [PRIMARY]

I read that using a no unique clustered index will add a 4 bytes hidden identifier column (http://msdn.microsoft.com/en-us/library/ms190639(v=sql.90).aspx), so I decide to add Identity (id) to cluster index (not sure if it is the right approach)

I want to ask (at the risk of sound ridiculous, but need to be sure):

How could it be improved?
Will I have an impact on disk size?
Should I rebuild the index once all data have been inserted?

EDIT:

About id, I think is there just as a bad habit. I'll kept it, not sure how previous job calculate running total (I've no access to it)

There are a lot of tables like this, like hundreds for each day(don't ask me why). That is why DBA team ask me to not create a new index because of size issues. That is why I thinking of rearrange table structure via clustered index. Also changing data types which exceeds normal ranges.

Lee · Accepted Answer

OK, so a million rows is pretty small, and your table here isn't going to be greater than 75-100 MB in size based on the information provided, therefore without knowing why you make mention these, I'm assuming that they are fairly trivial. Aside from that, you don't want to index the table and include ID (PK) because you'll get an RID Lookup. Essentially id in your PK is doing nothing for you (you said data is set, and unchanging, there is no reason to keep checking for uniqueness on anything...that was done in the source system) and if anything will slow down your queries, so what I would do is simply add a clustered index on SUM_Amount which will order the data that you're going for and create index seeks in the two queries shown.

Clustered Index improvemens for particular queries

Answers (1)

Related Questions