Phrozt
Phrozt

Reputation: 103

SQL Server - Index efficiency for group by in subselects

Attempting to make indexes that are most efficient for this setup:

View (RANKED_PHONE):

SELECT PHONE_ID,
       CONTACT_ID,
       TYPE_CD,
       ROW_NUMBER() OVER( PARTITION BY CONTACT_ID, TYPE_CD 
                              ORDER BY UPDATE_DT DESC ) AS PHONE_RANK
FROM   (SELECT PHONE_ID,
               CONTACT_ID,
               TYPE_CD,
               UPDATE_DT
        FROM   contact_phone
        WHERE  PHONE_ID IN (SELECT MAX(PHONE_ID)
                            FROM   contact_phone WITH(NOLOCK)
                            GROUP  BY CONTACT_ID,
                                      TYPE_CD,
                                      PHONE_NUMBER)) nodupes 

Current indices:

CREATE CLUSTERED INDEX [CIX_contactphone_PHONEID]
  ON [contact_phone] ([PHONE_ID])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_UPDATEDT]
  ON [contact_phone] ([UPDATE_DT])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER])
GO 

Other notes:

EDIT: I might as well include the rest of what I'm doing, so you can see the end picture:

SELECT @phone1 = COALESCE(CELL1, PERS1, BUS1, OTHER1, FAX1)
FROM(SELECT CELL1 = MAX(CASE WHEN PHONE_RANK = 1 THEN CELL END)
        , PERS1 = MAX(CASE WHEN PHONE_RANK = 1 THEN PERS END)
        , BUS1 = MAX(CASE WHEN PHONE_RANK = 1 THEN BUS END)
        , OTHER1 = MAX(CASE WHEN PHONE_RANK = 1 THEN OTHER END)
        , FAX1 = MAX(CASE WHEN PHONE_RANK = 1 THEN FAX END)
    FROM(
        SELECT * FROM [RANKED_PHONE]
        WHERE CONTACT_ID = @key
    ) phone
    PIVOT(MAX(PHONE_ID) 
        FOR TYPE_CD IN ([CELL],[PERS],[BUS],[OTHER],[FAX])
    ) as pvt) phones

After @phone1 is selected, then "AND PHONE_ID NOT IN (@phone1)" gets added to the where clause selecting from RANKED_PHONE, and then phone2 gets added.. etc. The point of this query is to Populate a set of phone number fields in a certain order of selection as per business rules. I wanted to put a view in to make as much of it cached as possible. Because of the fact that I need to filter out phone#s that have already been used, I cannot put the pivot into the view, as they need to be filtered before being pivoted.

* MAJOR EDIT *

A few things have changed here:

Also, here's some more insight on the further select statements, which may illustrate why this isn't an easy query to throw into a TVF. The differences in the coalesce columns correspond to different output columns in the select statements (see first edit for example of the select statement):

  1. SELECT @phone1 = COALESCE(CELL1, PERS1, BUS1, OTHER1, FAX1)
    ...
    WHERE CONTACT_ID = @key
    
  2. SELECT @phone2 = COALESCE(CELL2, PERS1, PERS2, BUS1, BUS2, OTHER1, OTHER2, FAX1, FAX2)
    ... 
    WHERE CONTACT_ID = @key
        AND PHONE_NUMBER NOT IN (SELECT PHONE_NUMBER FROM contact_phone WITH(NOLOCK)
            WHERE PHONE_ID IN (@phone1))
    
  3. SELECT @phone3 = COALESCE(CELL3, PERS1, PERS2, PERS3, BUS1, BUS2, BUS3, OTHER1, OTHER2, OTHER3, FAX1, FAX2, FAX3)
    ...
    WHERE CONTACT_ID = @key
        AND PHONE_NUMBER NOT IN (SELECT PHONE_NUMBER FROM contact_phone WITH(NOLOCK)
            WHERE PHONE_ID IN (@phone1,@phone2))
    

View [RANKED_PHONE]:

SELECT PHONE_ID
    , CONTACT_KEY
    , TYPE_CD
    , PHONE_NUMBER
    , ROW_NUMBER() OVER(
        PARTITION BY CONTACT_KEY,TYPE_CD
        ORDER BY UPDATE_DATE DESC, [PRIMARY] DESC
    ) PHONE_RANK
FROM contact_phone WITH(NOLOCK)

Indices:

CONSTRAINT [PK_contactphone_PHONEID] PRIMARY KEY ([PHONE_ID])

CREATE NONCLUSTERED INDEX [NIX_contactphone_UPDATEDATE_PRIMARY] ON [contact_phone] ([UPDATE_DATE],[PRIMARY])
GO

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTKEY_TYPECD_PHONENUMBER] ON [contact_phone] ([CONTACT_KEY],[TYPE_CD],[PHONE_NUMBER])
GO

Upvotes: 2

Views: 2257

Answers (2)

Vlad G.
Vlad G.

Reputation: 2147

@Phrozt - your original indexes look fine as is except I would go ahead and make your clustered index unique.

There's a problem with your view definition though that leads to poor performance. The innermost subquery is only connected to the outer query by PHONE_ID ("WHERE PHONE_ID IN..."). Because of that the query optimizer does not know that you are actually deduping the numbers for that one CONTACT_ID, and ends up deduping the entire table.

I suggest changing the view to something like this:

SELECT PHONE_ID,
       CONTACT_ID,
       TYPE_CD,
       ROW_NUMBER() OVER( PARTITION BY CONTACT_ID, TYPE_CD 
                              ORDER BY UPDATE_DT DESC ) AS PHONE_RANK
FROM   (SELECT p.PHONE_ID,
               p.CONTACT_ID,
               p.TYPE_CD,
               p.UPDATE_DT
        FROM   contact_phone p
        INNER JOIN (SELECT CONTACT_ID, TYPE_CD, MAX(PHONE_ID) AS PHONE_ID
                    FROM   contact_phone WITH(NOLOCK)
                    GROUP  BY CONTACT_ID,
                              TYPE_CD,
                              PHONE_NUMBER) t
            ON p.CONTACT_ID = t.CONTACT_ID AND p.TYPE_CD = t.TYPE_CD AND p.PHONE_ID = t.PHONE_ID
) nodupes 

Also, as a side note, I doubt that using the view vs. just including the entire thing in the final query will make any noticeable difference performance-wise.

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 453243

If you change the definition of

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER])

To

CREATE NONCLUSTERED INDEX [NIX_contactphone_CONTACTID_TYPECD_PHONENUMBER]
  ON [contact_phone] ([CONTACT_ID], [TYPE_CD], [PHONE_NUMBER], PHONE_ID DESC) 
  INCLUDE (UPDATE_DT)

(NB: Whilst this appears to include an extra key column this is not actually the case. The clustered index key gets added to the key of all non unique non clustered indexes anyway. However adding it explicitly allows the ASC/DESC to be defined.)

The index does have an extra INCLUDE-d column to cover the query though.

And change the query to

SELECT PHONE_ID,
       CONTACT_ID,
       TYPE_CD,
       ROW_NUMBER() OVER( PARTITION BY CONTACT_ID, TYPE_CD 
                              ORDER BY UPDATE_DT DESC ) AS PHONE_RANK
FROM   (SELECT PHONE_ID,
               CONTACT_ID,
               TYPE_CD,
               UPDATE_DT,
               RANK() OVER (PARTITION BY CONTACT_ID, TYPE_CD, PHONE_NUMBER 
                                        ORDER BY PHONE_ID DESC) AS PHONE_ID_RANK
        FROM   contact_phone
        ) nodupes 
        WHERE PHONE_ID_RANK = 1

That slightly improves the plan.

Plan

The ordering used by the first partition of CONTACT_ID, TYPE_CD, PHONE_NUMBER, PHONE_ID DESC does not match that desired by the second part of the query CONTACT_ID, TYPE_CD,UPDATE_DT though so you will always get a sort.

Upvotes: 2

Related Questions