SQL Server - Index efficiency for group by in subselects

Question

Attempting to make indexes that are most efficient for this setup:

View (RANKED_PHONE):

SELECT PHONE_ID,
       CONTACT_ID,
       TYPE_CD,
       ROW_NUMBER() OVER( PARTITION BY CONTACT_ID, TYPE_CD 
                              ORDER BY UPDATE_DT DESC ) AS PHONE_RANK
FROM   (SELECT PHONE_ID,
               CONTACT_ID,
               TYPE_CD,
               UPDATE_DT
        FROM   contact_phone
        WHERE  PHONE_ID IN (SELECT MAX(PHONE_ID)
                            FROM   contact_phone WITH(NOLOCK)
                            GROUP  BY CONTACT_ID,
                                      TYPE_CD,
                                      PHONE_NUMBER)) nodupes

Current indices:

CREATE CLUSTERED INDEX [CIX_contactphone_PHONEID]
  ON [contact_phone] ([PHONE_ID])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_UPDATEDT]
  ON [contact_phone] ([UPDATE_DT])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER])
GO

Other notes:

There is an SSIS package that runs a lookup on PHONE_ID to check whether to insert or update
This view will be further filtered in selects by looking for specific CONTACT_IDs filtering out PHONE_IDs that have been used.

EDIT: I might as well include the rest of what I'm doing, so you can see the end picture:

SELECT @phone1 = COALESCE(CELL1, PERS1, BUS1, OTHER1, FAX1)
FROM(SELECT CELL1 = MAX(CASE WHEN PHONE_RANK = 1 THEN CELL END)
        , PERS1 = MAX(CASE WHEN PHONE_RANK = 1 THEN PERS END)
        , BUS1 = MAX(CASE WHEN PHONE_RANK = 1 THEN BUS END)
        , OTHER1 = MAX(CASE WHEN PHONE_RANK = 1 THEN OTHER END)
        , FAX1 = MAX(CASE WHEN PHONE_RANK = 1 THEN FAX END)
    FROM(
        SELECT * FROM [RANKED_PHONE]
        WHERE CONTACT_ID = @key
    ) phone
    PIVOT(MAX(PHONE_ID) 
        FOR TYPE_CD IN ([CELL],[PERS],[BUS],[OTHER],[FAX])
    ) as pvt) phones

After @phone1 is selected, then "AND PHONE_ID NOT IN (@phone1)" gets added to the where clause selecting from RANKED_PHONE, and then phone2 gets added.. etc. The point of this query is to Populate a set of phone number fields in a certain order of selection as per business rules. I wanted to put a view in to make as much of it cached as possible. Because of the fact that I need to filter out phone#s that have already been used, I cannot put the pivot into the view, as they need to be filtered before being pivoted.

* MAJOR EDIT *

A few things have changed here:

Because I'm filtering out used phone numbers in subsequent queries, I figured I could completely strip out the de-duping section.
I randomly found that in some instances, the UPDATE_DATE was the EXACT same, so I had to throw another field in the ORDER BY for the partition.

Also, here's some more insight on the further select statements, which may illustrate why this isn't an easy query to throw into a TVF. The differences in the coalesce columns correspond to different output columns in the select statements (see first edit for example of the select statement):

SELECT @phone1 = COALESCE(CELL1, PERS1, BUS1, OTHER1, FAX1)
...
WHERE CONTACT_ID = @key

SELECT @phone2 = COALESCE(CELL2, PERS1, PERS2, BUS1, BUS2, OTHER1, OTHER2, FAX1, FAX2)
... 
WHERE CONTACT_ID = @key
    AND PHONE_NUMBER NOT IN (SELECT PHONE_NUMBER FROM contact_phone WITH(NOLOCK)
        WHERE PHONE_ID IN (@phone1))

SELECT @phone3 = COALESCE(CELL3, PERS1, PERS2, PERS3, BUS1, BUS2, BUS3, OTHER1, OTHER2, OTHER3, FAX1, FAX2, FAX3)
...
WHERE CONTACT_ID = @key
    AND PHONE_NUMBER NOT IN (SELECT PHONE_NUMBER FROM contact_phone WITH(NOLOCK)
        WHERE PHONE_ID IN (@phone1,@phone2))

View [RANKED_PHONE]:

SELECT PHONE_ID
    , CONTACT_KEY
    , TYPE_CD
    , PHONE_NUMBER
    , ROW_NUMBER() OVER(
        PARTITION BY CONTACT_KEY,TYPE_CD
        ORDER BY UPDATE_DATE DESC, [PRIMARY] DESC
    ) PHONE_RANK
FROM contact_phone WITH(NOLOCK)

Indices:

CONSTRAINT [PK_contactphone_PHONEID] PRIMARY KEY ([PHONE_ID])

CREATE NONCLUSTERED INDEX [NIX_contactphone_UPDATEDATE_PRIMARY] ON [contact_phone] ([UPDATE_DATE],[PRIMARY])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTKEY_TYPECD_PHONENUMBER] ON [contact_phone] ([CONTACT_KEY],[TYPE_CD],[PHONE_NUMBER])
GO

Martin Smith · Accepted Answer

If you change the definition of

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER])

To

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER], PHONE_ID DESC) 
  INCLUDE (UPDATE_DT)

(NB: Whilst this appears to include an extra key column this is not actually the case. The clustered index key gets added to the key of all non unique non clustered indexes anyway. However adding it explicitly allows the ASC/DESC to be defined.)

The index does have an extra INCLUDE-d column to cover the query though.

And change the query to

SELECT PHONE_ID,
       CONTACT_ID,
       TYPE_CD,
       ROW_NUMBER() OVER( PARTITION BY CONTACT_ID, TYPE_CD 
                              ORDER BY UPDATE_DT DESC ) AS PHONE_RANK
FROM   (SELECT PHONE_ID,
               CONTACT_ID,
               TYPE_CD,
               UPDATE_DT,
               RANK() OVER (PARTITION BY CONTACT_ID, TYPE_CD, PHONE_NUMBER 
                                        ORDER BY PHONE_ID DESC) AS PHONE_ID_RANK
        FROM   contact_phone
        ) nodupes 
        WHERE PHONE_ID_RANK = 1

That slightly improves the plan.

Plan

The ordering used by the first partition of CONTACT_ID, TYPE_CD, PHONE_NUMBER, PHONE_ID DESC does not match that desired by the second part of the query CONTACT_ID, TYPE_CD,UPDATE_DT though so you will always get a sort.

SQL Server - Index efficiency for group by in subselects

Answers (2)

Related Questions