Tom
Tom

Reputation: 144

Understanding Include on Index

I have the below query:

USE [AxReports]
GO

DECLARE @paramCompany varchar(3)
SET @paramCompany = 'adf'

SELECT  stl.MAINSALESID,
        st.DATAAREAID,
        Sum(sl.SALESQTY) as 'Quantity',
        Sum(sl.SALESQTY * sl.SALESPRICE) as 'SalesValue'
INTO #openrel
FROM
    DynamicsV5Realtime.dbo.SALESTABLE st
INNER JOIN
    DynamicsV5Realtime.dbo.SALESLINE sl
ON
    sl.SALESID = st.SALESID
    and sl.DATAAREAID = st.DATAAREAID
INNER JOIN
    DynamicsV5Realtime.dbo.INVENTTABLE it
ON
    it.ITEMID = sl.ITEMID
    and it.DATAAREAID = sl.DATAAREAID
INNER JOIN
    DynamicsV5Realtime.dbo.SALESTABLELINKS stl
ON
    stl.SUBSALESID = st.SALESID
    and stl.DATAAREAID = st.DATAAREAID
WHERE
    st.DATAAREAID = @paramCompany
    and st.SALESTYPE = 3 -- Release Order
    and st.SALESSTATUS = 1
    and sl.SALESSTATUS <> 4
    and it.ITEMGROUPID <> 'G0022A'
GROUP BY
    stl.MAINSALESID,
    st.DATAAREAID

My execution plan is recommending an index of :

USE [DynamicsV5Realtime]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[INVENTTABLE] ([DATAAREAID],[ITEMGROUPID])
INCLUDE ([ITEMID])
GO

However I already have an index on that table that is similar which the plan is using but performs a table scan against it. The current index is below:

CREATE NONCLUSTERED INDEX [I_ITEMGROUPIDX] ON [dbo].[INVENTTABLE]
(
[ITEMID] ASC,
[DATAAREAID] ASC
)
INCLUDE (   [ITEMGROUPID]) 
GO

I have an understanding that you should only put things as an included column when you are not bothered about them being sorted at the leaf level (I think thats correct?).

In this case the WHERE clause has it.ITEMGROUPID <> 'G0022A' so putting that as a key column would make sense as it will be quicker to seek that column in order, (again I think I am right in saying that?)

However what about the joins, why does it recommend to put the ITEMID column as an include but not the DATAAREAID column? ITEMID and DATAAREAID make up the PK in this case so is it something to do with not needing to sort both columns and would perhaps using the existing index but putting the ITEMGROUPID as a key columm be a better solution that adding a new index? (thats something I can test I suppose)

Thanks

Upvotes: 2

Views: 107

Answers (2)

Jon Hanna
Jon Hanna

Reputation: 113272

Let's consider this table in relative isolation first; that is we'll only pay attention to those parts of the query where it is directly mentioned.

Executing the query needs to do the following:

  1. Find all rows in INVENTTABLE where the ITEMGROUPID column is equal to 'G0022A'.
  2. Find the values of the DATAAREAID and ITEMID columns in those rows, for use in finding the necessary rows in SALESLINE.

The best index for doing part one is one that has a key on ITEMGROUPID but no other columns. Such a key (we'll ignore included columns for now) would enable a table scan to find the relevant rows and those only.

If there was no such index but there was an index that had ITEMGROUPID as one of its columns, then that index could be used in a table scan instead, though not quite as efficiently.

Now, when we come to considering the second part, the only values we actually care about getting from the row are DATAAREAID and ITEMID.

If those fields where included, then they can be used in an index scan.

If they are actually parts of the key, or one of them is and the other is included, then that index can also be used for such an index scan.

So. At this point, considering only those aspects we said we would consider at this point and ignoring other considerations (index size, cost of inserts, etc), then any of the following indices would be useful here:

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID],[DATAAREAID],[ITEMID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID])
INCLUDE ([DATAAREAID],[ITEMID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID])
INCLUDE ([ITEMID],[DATAAREAID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([DATAAREAID],[ITEMGROUPID])
INCLUDE ([ITEMID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMID],[ITEMGROUPID])
INCLUDE ([DATAAREAID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID],[DATAAREAID])
INCLUDE ([ITEMID])

CREATE NONCLUSTERED INDEX [someIndexName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID],[ITEMID])
INCLUDE ([DATAAREAID])

Each one of these indices contains ITEMGROUPID as all or part of the key and both ITEMID and DATAAREAID as either part of the key, or as an included column.

Note that they index you do have is the opposite to this; it has the column that would be ideally a key as an included column, and the others as part of the key. It's better than nothing and the query planner can re-jigger things to make use of it, but it's not the ideal key for what we've determined we want.

Now, lets consider the query as a whole.

  1. Note that we will be searching SALESTABLE based on its DATAAREAID column.
  2. Note that SALESLINE is joined to that column on its own DATAAREAID column.
  3. Note that INVENTTABLE is in turn joined to that column on SALESLINE based on its own DATAAREAID column.

From this we can deduce that we logically only want those records from INVENTTABLE that have the value @paramCompany in their DATAAREAID column.

And the planner made that deduction.

So, considering the query as a whole, we can change our two actions above to:

  1. Find all rows in INVENTTABLE where the ITEMGROUPID column is equal to 'G0022A' and where DATAAREAID is equal to @paramCompany.
  2. Find the values of the DATAAREAID (already got in step 1) and ITEMID columns in those rows.

Hence the ideal index for this would be either:

CREATE NONCLUSTERED INDEX [someName]
ON [dbo].[INVENTTABLE] ([ITEMGROUPID],[DATAAREAID])
INCLUDE ([ITEMID])
GO

OR

CREATE NONCLUSTERED INDEX [someName]
ON [dbo].[INVENTTABLE] ([DATAAREAID],[ITEMGROUPID])
INCLUDE ([ITEMID])
GO

(Or one that includes all three in the key, but there are other reasons not to have a large key if you don't actually need it).

And the second is indeed what you were advised to do.

Upvotes: 3

SAS
SAS

Reputation: 4035

This should be easy to Google, but I would say to basically just have the columns that are used in joins in the index and include return columns so that there is no need to do a lookup on the actual table (al is included in the index). I would say recommendations can be more or less reliable, perhaps due to bad statistics or whatever, don't blindly rely on them. Also, I believe indexes can not be used when the operator is '<>'.

Upvotes: 0

Related Questions