Reputation: 767
I have a sqlserver table with the usual
intID(primary key),field1,field2,manyotherfields..., datetime TimeOperation
99% of my different kind of queries start with a TimeOperation BETWEEN startTime AND endTime
, and then select * (or count(*)) where fieldA=xxx
, and join with other smaller tables.
select *
because more or less I need all the fields.
I obviusly created an index on TimeOperation
... but performance are not good enough, so I want to add some index key columns or index included columns, but I'm a little bit confused.
I get the difference between the two, but I don't get how much adding a column in each case impacts on speed and on size.
I guess that the biggest improvement would be to create an index including ALL the columns, is it right? (but I can't afford it in terms of space)
And if I often use field1=xxx
for example, adding field1
to the index key columns (after TimeOperation
) would give better performance right?
Also...just to be sure how an index with included columns works: if I select rows with TimeOperation
in a certain range, sql seeks my TimeOperation index for the rows I'm interested in, and it is faster than scanning all the table because in the index the TimeOperation values are in ascending order, is it right? But then I need all the data now I need all the rest of the data fields of those rows...how does sql acts to retrieve the data? I guess it has a sort of bookmark to those rows in the index, right? But it has to hit the table multiple times then... so including all the columns in the index will save the time to hit the table, it it correct?
Thanks! Mattia
Upvotes: 0
Views: 708
Reputation: 107247
We will need more information on your table examples of your queries to address this fully, but:
TimeOperation
as the first column should address the bulk of queries against TimeOperation
.TimeOperation
, you might consider building your clustered index around it.field1 = x
then you need a separate index just for field1
(assuming that it is suitably selective), i.e. no TimeOperation
on the index if its not in the WHERE clause of your query.select
statement, the lookup can be avoided. But since you are using SELECT(*), covering indexes are unlikely to help .Edit
Explanation - Selectivity and density are explained in detail here. e.g. iff your queries against TimeOperation
return only a small number of rows (rule of thumb is < 5%, but this isn't always), will the index be used, i.e. your query is selective enough for SQL to choose the index on TimeOperation
.
The basic starting point would be:
CREATE TABLE [MyTable]
(
intID INT ID identity(1,1) NOT NULL,
field1 NVARCHAR(20),
-- .. More columns, which may be selected, but not filtered
TimeOperation DateTime,
CONSTRAINT PK_MyTable PRIMARY KEY (IntId)
);
And the basic indexes will be
CREATE NONCLUSTERED INDEX IX_MyTable_1 ON [MyTable](TimeOperation);
CREATE NONCLUSTERED INDEX IX_MyTable_2 ON [MyTable](Field1);
Clustering Consideration / Option
If most of your records are inserted in 'serial' ascending TimeOperation order, i.e. intId and TimeOperation will both increase in tandem, then I would leave the clustering on intID (the default) (i.e. table DDL is PRIMARY KEY CLUSTERED (IntId)
, which is the default anyway).
However, if there is NO correlation between IntId
and TimeOperation
, and IF most of your queries are of the form SELECT * FROM [MyTable] WHERE TimeOperation between xx and yy
then CREATE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation)
(and changing PK to PRIMARY KEY NONCLUSTERED (IntId)
) should improve this query (Rationale: since contiguous times are kept together, fewer pages need to be read, and the bookmark lookup will be avoided). Even better, if values of TimeOperation
are guaranteed to be unique, then CREATE UNIQUE CLUSTERED INDEX CL_MyTable ON MyTable(TimeOperation)
will improve density as it will avoid the uniqueifier.
Note - for the rest of this answer, I'm assuming that your IntId
and TimeOperations
ARE strongly correlated and hence the clustering is by IntId
.
Covering Indexes
As others have mentioned, your use of SELECT (*)
is bad practice and inter alia means covering indexes won't be of any use (the exception being COUNT(*)
).
If your queries weren't SELECT(*), but instead e.g.
SELECT TimeOperation, field1
FROM
WHERE TimeOperation BETWEEN x and y -- and returns < 5% data.
Then altering your index on TimeOperation
to include field1
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation) INCLUDE(Field1);
OR adding both to the index (with the most common filter first, or the most selective first if both filters are always present)
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation, Field1);
Either will avoid the rid / key lookup. The second (,) option will address your query where BOTH TimeOperation and Field1 are filtered in a WHERE or HAVING clause.
Re : What's the difference between index on (TimeOperation, Field1) and separate indexes?
e.g.
CREATE NONCLUSTERED INDEX IX_MyTable ON [MyTable](TimeOperation, Field1);
will not be useful for the query
SELECT ... FROM MyTable WHERE Field1 = 'xyz';
The index will only be useful for the queries which have TimeOperation
SELECT ... FROM MyTable WHERE TimeOperation between x and y;
OR
SELECT ... FROM MyTable WHERE TimeOperation between x and y AND Field1 = 'xyz';
Hope this helps?
Upvotes: 1
Reputation: 71565
An index, at its most basic, creates a layer of the "hypertree" structure behind the scenes, which allows the SQL engine to more easily find rows with particular values for indexed columns. Each index creates a different way to "drill down" into the table's data using a binary search (logN performance). Each index you add makes selecting by that index faster, at the cost of slowing insertions/updates (the data must be put in and then indexes must be created).
An index, therefore, should normally be created for combinations of columns that are commonly used to filter records. I would indeed create an index on TimeOperation, and TimeOperation alone.
NEVER simply create an index including all columns of a table, especially a wide one such as this.
Upvotes: 0