whytheq
whytheq

Reputation: 35567

CROSS APPLY date table onto some data - why the duplication

This is the data:

SELECT X.*
INTO   #X
FROM   (
       VALUES
       ('A',CAST('01 Feb 2013' AS DATETIME)),
       ('B','01 Mar 2013'),
       ('C','01 Sep 2013')
       ) X(Player,Mth)

We have a pretty standard factory DimDate table in our warehouse.

If I do the following to find the number of days it works fine:

SELECT Player,Mth,
    numDays_mth = COUNT(XXX.DateKey),
FROM #X A
    CROSS APPLY 
    (
    SELECT DateKey 
    FROM WHData.dbo.vw_DimDate DT 
    WHERE DT.DayMarker >= A.Mth 
        AND DT.DayMarker < DATEADD(MONTH, 1, A.Mth)
    ) XXX

Also for the number of days from the previous month things work fine:

SELECT Player,Mth,
    numDays_prevMth = COUNT(YYY.DateKey)     
FROM #X A
    CROSS APPLY 
    (
    SELECT DateKey 
    FROM WHData.dbo.vw_DimDate DTT 
    WHERE DTT.DayMarker >= DATEADD(MONTH, -1, A.Mth) 
    AND DTT.DayMarker < A.Mth
    ) YYY

If I combine the two then there are problems:

SELECT Player,Mth,
    numDays_mth = COUNT(XXX.DateKey),
    numDays_prevMth = COUNT(YYY.DateKey)    
FROM #X A
     CROSS APPLY 
     (
     SELECT DateKey 
     FROM WHData.dbo.vw_DimDate DT 
     WHERE DT.DayMarker >= A.Mth 
         AND DT.DayMarker < DATEADD(MONTH, 1, A.Mth)
     ) XXX
    CROSS APPLY 
    (
    SELECT DateKey 
    FROM WHData.dbo.vw_DimDate DTT 
    WHERE DTT.DayMarker >= DATEADD(MONTH, -1, A.Mth) 
    AND DTT.DayMarker < A.Mth
    ) YYY

I realise there are plenty of other ways of getting these day counts but I'm trying to understand the workings of the APPLY operator as much as anything.

How do I keep using APPLY twice and stop the duplication of data?


note

Changing the SELECT clause to the following works on this trivial example but has a massive impact on performance for a production script:

SELECT Player,Mth,
    numDays_mth = COUNT(DISTINCT XXX.DateKey),
    numDays_prevMth = COUNT(DISTINCT YYY.DateKey)   

Upvotes: 1

Views: 1272

Answers (1)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239724

I would move the COUNTs inside of the CROSS APPLY so that each one only produces one row of output for each input row on the left hand side. That way you avoid creating additional rows to act as input to the second (or subsequent) APPLY:

SELECT Player,Mth,
    numDays_mth = XXX.Cnt,
    numDays_prevMth = YYY.Cnt   
FROM #X A
     CROSS APPLY 
     (
     SELECT COUNT(DateKey) 
     FROM WHData.dbo.vw_DimDate DT 
     WHERE DT.DayMarker >= A.Mth 
         AND DT.DayMarker < DATEADD(MONTH, 1, A.Mth)
     ) XXX (Cnt)
    CROSS APPLY 
    (
    SELECT COUNT(DateKey)
    FROM WHData.dbo.vw_DimDate DTT 
    WHERE DTT.DayMarker >= DATEADD(MONTH, -1, A.Mth) 
    AND DTT.DayMarker < A.Mth
    ) YYY (Cnt)

Upvotes: 2

Related Questions