Yossi
Yossi

Reputation: 341

Filling down missing (time-period) rows in a SQL query result-set

Update: 8/15/2014: See final working solution at the end, based on LMU92's suggestions/sample

I have a query that reads a financial transactions table. The table is a roll-up of detail transactions, is regenerated nightly, and is used for read/SELECT only. Platform is SQL Server 2012.

This sampling is result of of the primary query, which returns historical SUM(Amount) by time period, Account, and Category. The time window reported is parameter driven, for this sample from 1/1/2014 to 5/31/2014:

    TimePeriod   Start      End        Category        Account           Amount
    ----------   --------   ---------  ------------    ----------------  ------------
    month        1/1/2014   1/31/2014  CategoryX       AccountA          2421.00
    month        4/1/2014   4/30/2014  CategoryX       AccountA          1421.00 
    month        5/1/2014   5/31/2014  CategoryY       AccountA          9421.00
    month        1/1/2014   1/31/2014  CategoryZ       AccountB          2421.00
    month        3/1/2014   3/31/2014  CategoryZ       AccountB          6421.00
    ...

The result I'm after would eliminate the gaps by filling-down any gap (transaction-less) periods with 0.00 amounts, for example, for Month/AccountA/CategoryX:

    TimePeriod   Start      End        Category        Account           Amount
    ----------   --------   ---------  ------------    ----------------  ------------
    month        1/1/2014   1/31/2014  CategoryX       AccountA          2421.00
    month        2/1/2014   2/28/2014  CategoryX       AccountA             0.00
    month        3/1/2014   3/31/2014  CategoryX       AccountA             0.00
    month        4/1/2014   4/30/2014  CategoryX       AccountA          1421.00 
    month        5/1/2014   5/31/2014  CategoryX       AccountA             0.00 

The challenge is that the rollup is by multiple period types (day/week/month/quarter/year), each of which can be broken down by Account/Category. The overall pool of records is 10 million for all periods combined, and with numbers increasing as the periods become more fractional (e.g. weeks/days).

I've tried a CTE which performed poorly (despite index tweaking, it appeared to process heavily), also attempted adding 0.00 records which as exponentially increased the number of records in the pool, since it is one record per transaction-less period (d/w/m/q/y), per account (commencing with the account's first transaction date), per category, it took lots of index tweaking to get such a large pool to give us acceptable/near acceptable performance, and has also increased the time it takes to perform our nightly load. I looked into doing a cube, but that appeared to be overkill for what we are doing.

The solution I'm looking for would do this on the fly, and without calendar table. I do have a calendar dimension table if that is the only efficient way to do this on the fly).

Any suggestions are very much appreciated.

A simplified version of the DDL/(T-)SQL:

Table:

CREATE TABLE TxnRollups (
    TxnTimePeriod          VARCHAR(10), --Year/Quarter/Month/Week/Day
    TxnPeriodStartDate     DATE,
    TxnPeriodEndDate       DATE,
    TxnAccountID           VARCHAR(10),
    TxnAccountType         VARCHAR(20),
    TxnAccountName         VARCHAR(20),
    TxnAccountHierL1       VARCHAR(20),
    TxnAccountHierL2       VARCHAR(20),       
    TxnAccountHierL3       VARCHAR(20),
    TxnCategory            VARCHAR(20),
    Amount                 DECIMAL(16,3)
)

Query:

    CREATE PROCEDURE GetTxnByPeriod(@FromDate DATE, @ToDate DATE, @SummaryPeriod VARCHAR(20)) 
      AS
        BEGIN
                SELECT TxnR.TxnTimePeriod   TimePeriod,
                       TxnR.TxnPeriodStart  Start,
                       TxnR.TxnPeriodEnd    End,
                       TxnR.TxnCategory     Category,
                       TxnR.TxnAccountName  Account,
                       SUM(TxnRAmount)      Amount
                From TxnRollups TxnR
                WHERE 
                       TxnR.TxnTImePeriod = @SummaryPeriod AND
                       TxnR.TxnPeriodEnd BETWEEN @FromDate AND @ToDate
                GROUP BY
                       TxnR.TxnTimePeriod,
                       TxnR.TxnPeriodStart,
                       TxnR.TxnPeriodEnd,
                       TxnR.TxnCategory,
                       TxnR.TxnAccountName

    END

Update: 8/15/2014: Final working solution

Overview

With the following approach, I was able to produce the desired result-set, and also eliminate the need of using the a roll-up, instead querying the raw data-set directly. Most importantly:

Execution time: 1k-1.5k ms depending on the amount of data returned. We did not even get get this performance out of our roll-up.

Structure:

  1. Per LMU92's initial suggestion, created a Months table with FirstdayofMonth/LastdayofMonth
  2. Altered our transaction detail raw (transaction detail) data-set with First/Last month columns, and updated their contents to match the First/Last day of the month for the date of each respective row.
  3. Added numeric (FLOAT) columns to both tables and stored the First/Last date representations as numeric. The objective with this was to boost performance by doing our date compares using floats.
  4. Created indexes on the numeric date columns.
  5. Used a CTE with a cross join on a set of unique columns (account ID/Category) and the months table. 6 CTE Results to #Temp table. 7 Created non-clustered indexes on #Temp table 8 Selected a distinct List of Date/Account/Category/Sum(Amount) from the transaction detail table into another #Temp table. 9 Added indexes 10 RIGHT OUTER JOIN of the Full Dates/Acct table, with the transaction detail, results as desired (see question).

** this is an abbreviated version of the code (some debug, non-essential's removed) **

CREATE PROCEDURE    GetTransactionsByMonth
                    (
                            @FromDate                   DATE                =           NULL        , 
                            @ToDate                     DATE                =           NULL        ,
                            @TxnCategory                VARCHAR(7)          =           'CAT1'      ,
                            @AccountType1               VARCHAR(21)         =           NULL        ,
                            @AccountType2               VARCHAR(21)         =           NULL        ,
                            @AccountType3               VARCHAR(21)         =           NULL        ,
                            @Debug                      BIT                 =           0           
                    ) 

WITH RECOMPILE

            AS
                BEGIN

                    DECLARE @True   AS BIT = 1
                    DECLARE @False  AS BIT = 0

                    PRINT IIF(@Debug = @True, 'START Procedure - ' + CAST(SYSDATETIMEOFFSET() AS VARCHAR),NULL)
                    IF @Debug = @True
                        BEGIN
                            SET STATISTICS TIME ON
                            SET STATISTICS IO   ON
                            DECLARE @NoCountState int = @@OPTIONS & 512;
                            SET NOCOUNT OFF;
                        END
                    /*================================================================================================================
                        Initialization - Declarations
                      ================================================================================================================*/
                    DECLARE         @MinDateN                       FLOAT
                    DECLARE         @MaxDateN                       FLOAT
                    DECLARE         @MinDateD                       DATE
                    DECLARE         @MaxDateD                       DATE
                    /*================================================================================================================
                        Initialization - Establish Date Ranges
                      ================================================================================================================*/
                    SET @FromDate   =   DATEADD(MM, DATEDIFF(MM, 0, @FromDate), 0)   --Set @FromDate to first of the requested month
                    SET @MinDateN   =   FLOOR(CAST(CAST(@FromDate as DateTime) as float)) 
                    SET @ToDate     =   DATEADD(MM, DATEDIFF(MM, 0, @ToDate), 0)     --Set @ToDate to first of the requested month
                    SET @MaxDateN   =   FLOOR(CAST(CAST(@ToDate as DateTime) as float)) 
                    SET @MinDateD = CAST(FLOOR(CAST(@MinDateN AS FLOAT)) AS DATETIME)  --For output only
                    SET @MaxDateD = CAST(FLOOR(CAST(@MaxDateN AS FLOAT)) AS DATETIME) --For output only

                    ; WITH CTE_MonthsRollUp AS (

                        SELECT DISTINCT
                                        MonthsTable.NFirstDayOfMonth        AS      PeriodStartN            ,
                                        MonthsTable.NLastDayOfMonth         AS      PeriodEndN              ,
                                        MonthsTable.FirstDayOfMonth         AS      PeriodStartD            ,
                                        MonthsTable.LastDayOfMonth          AS      PeriodEndD              ,
                                        TransactionDetail.AccountId         AS      AccountId           ,
                                        TransactionDetail.AcctCategory      AS      AcctCategory

                                  FROM  dbo.Months                          AS      MonthsTable

                            CROSS JOIN  dbo.tblTxnDetail                    AS      TransactionDetail

                                 WHERE  TransactionDetail.AccountId     IS      NOT NULL
                                 AND    TransactionDetail.AcctCategory       =      @AcctCategory
                                 AND    TransactionDetail.AccountType       IN      (
                                                                                        @AccountType1       ,
                                                                                        @AccountType2       ,
                                                                                        @AccountType3       ,
                                                                                    )

                                   AND  MonthsTable.NFirstDayOfMonth        <=      @MaxDateN
                                   AND  MonthsTable.NLastDayOfMonth         >=      @MinDateN
                                   AND  MonthsTable.NLastDayOfMonth         >=      (
                                                                                        SELECT  MIN(NFirstDayOfMonth)           AS  EarliestTxnDateN
                                                                                        FROM    tblTxnDetail        AS  ValidateAccount
                                                                                        WHERE   ValidateAccount.AccountId   =   TransactionDetail.AccountId
                                                                                    )
                        GROUP BY
                                        MonthsTable.NFirstDayOfMonth                                        ,
                                        MonthsTable.NLastDayOfMonth                                         ,
                                        MonthsTable.FirstDayOfMonth                                         ,
                                        MonthsTable.LastDayOfMonth                                          ,
                                        TransactionDetail.AccountId                                     ,
                                        TransactionDetail.AcctCategory

                    )

                    SELECT      MonthsRollupResults.* 
                      INTO      #TMonthsRollup
                      FROM      CTE_MonthsRollUp MonthsRollupResults
                    OPTION     (RECOMPILE)

                    ;           
                    CREATE NONCLUSTERED INDEX   [#idxTMonthsRollup_AccountIDandTxnCategory_Join]
                                           ON   [dbo].[#TMonthsRollup] ([AccountId],[AcctCategory], PeriodStartN)
                    INCLUDE                     (PeriodStartD, PeriodEndD)
                    ;

                    ; WITH CTE_AccountList  AS  (
                                                    SELECT DISTINCT     DistinctAccountList.NFirstDayOfMonth        AS      PeriodStartN            ,
                                                                        DistinctAccountList.NLastDayOfMonth         AS      PeriodEndN              ,
                                                                        DistinctAccountList.AccountId               AS      AccountId               , 
                                                                        DistinctAccountList.AcctCategory            AS      AcctCategory            ,
                                                                        DistinctAccountList.AccountType             AS      AccountType             ,
                                                                        DistinctAccountList.Account                 AS      AccountName             ,
                                                                        DistinctAccountList.ACCOUNTL1               AS      AccountHierarchyL1      ,
                                                                        DistinctAccountList.ACCOUNTL2               AS      AccountHierarchyL2      ,
                                                                        DistinctAccountList.ACCOUNTL3               AS      AccountHierarchyL3      ,
                                                                        SUM(DistinctAccountList.Amount)             AS      PeriodAmount        


                                                               FROM     tblTxnDetail                                AS      DistinctAccountList

                                                              WHERE     DistinctAccountList.NFirstDayOfMonth        <=      @MaxDateN
                                                                AND     DistinctAccountList.NLastDayOfMonth         >=      @MinDateN
                                                                AND     DistinctAccountList.AccountId               IS      NOT NULL
                                                                AND     DistinctAccountList.AcctCategory            =       @AcctCategory
                                                                AND     DistinctAccountList.AccountType             IN      (
                                                                                                                                @AccountType1       ,
                                                                                                                                @AccountType2       ,
                                                                                                                                @AccountType3       ,

                                                                                                                            )

                                                           GROUP BY     DistinctAccountList.NFirstDayOfMonth                                ,
                                                                        DistinctAccountList.NLastDayOfMonth                                 ,
                                                                        DistinctAccountList.AccountId                                       ,
                                                                        DistinctAccountList.AcctCategory                                    ,
                                                                        DistinctAccountList.AccountType                                     ,
                                                                        DistinctAccountList.Account                                         ,
                                                                        DistinctAccountList.ACCOUNTL1                                       ,
                                                                        DistinctAccountList.ACCOUNTL2                                       ,
                                                                        DistinctAccountList.ACCOUNTL3                                       


                    ) 


                    SELECT      DistinctAccountList.* 
                      INTO      #TAccountList
                      FROM      CTE_AccountList DistinctAccountList
                    ;

                    CREATE NONCLUSTERED INDEX   [#idxTAccountList_DistincAccountDetailsList_For_Join]
                                           ON   [dbo].[#TAccountList] ([AccountId],[AcctCategory], PeriodStartN)
                    INCLUDE                     (AccountName, PeriodCredit, PeriodDebit, PeriodAmount, AccountType, 
                                                 AccountHierarchyL1,AccountHierarchyL2,AccountHierarchyL3,AccountHierarchyL4,
                                                 AccountHierarchyL5, BSReportHierarchyL1, BSReportHierarchyL2, BSReportHierarchyL3,
                                                 BSReportHierarchyL4, PLReportHierarchyL1, PLReportHierarchyL2, PLReportHierarchyL3
                                                )
                    ;
                    SELECT DISTINCT         'Month'                             AS      Period              ,
                                            @MinDateD                           AS      ReportStart         ,
                                            @MaxDateD                           AS      ReportEnd           ,   
                                            tMonthRollup.AccountId              AS      AccountId           , 
                                            tMonthRollup.AcctCategory           AS      AcctCategory        , 
                                            tMonthRollup.PeriodStartD           AS      PeriodStart         , 
                                            tMonthRollup.PeriodEndD             AS      PeriodEnd           , 
                                            AccountList.AccountName             AS      AccountName         ,
                                            AccountList.PeriodAmount            AS      PeriodAmount        ,
                                            AccountList.AccountType             AS      AccountType         ,
                                            AccountList.AccountHierarchyL1      AS      AccountHierarchyL1  ,
                                            AccountList.AccountHierarchyL2      AS      AccountHierarchyL2  ,
                                            AccountList.AccountHierarchyL3      AS      AccountHierarchyL3  


                               FROM         #TAccountList                       AS      AccountList
                   RIGHT OUTER JOIN         #TMonthsRollup                      AS      tMonthRollup

                                 ON         AccountList.AccountId           =       tMonthRollup.AccountId
                                AND         Accountlist.AcctCategory                =       tMonthRollup.AcctCategory
                                AND         AccountList.PeriodStartN            =       tMonthRollup.PeriodStartN

            --------------------------------------------------------
            PRINT IIF(@Debug = @True, 'END PROCEDURE - ' + CAST(SYSDATETIMEOFFSET() AS VARCHAR),NULL)
            --------------------------------------------------------
            IF @Debug = @True
                BEGIN
                    SET STATISTICS TIME OFF
                    SET STATISTICS IO   OFF
                    IF @NoCountState <> 0 
                        SET NOCOUNT ON
                END
END

Upvotes: 2

Views: 2919

Answers (2)

Lmu92
Lmu92

Reputation: 952

Here's a coded version how to perform the rollup and joini it against a calendar table:

            WITH cte_Rollup as
           (SELECT TxnR.TxnTimePeriod   TimePeriod,
                   TxnR.TxnPeriodStart  Start,
                   TxnR.TxnPeriodEnd    End,
                   TxnR.TxnCategory     Category,
                   TxnR.TxnAccountName  Account,
                   SUM(TxnRAmount)      Amount
            From TxnRollups TxnR
            WHERE 
                   TxnR.TxnTImePeriod = @SummaryPeriod AND
                   TxnR.TxnPeriodEnd BETWEEN @FromDate AND @ToDate
            GROUP BY
                   TxnR.TxnTimePeriod,
                   TxnR.TxnPeriodStart,
                   TxnR.TxnPeriodEnd,
                   TxnR.TxnCategory,
                   TxnR.TxnAccountName
            ), cte_Calendar AS
            (
            SELECT cal.Period, Min(cal.PeriodDate) as PeriodStart, Max(cal.PeriodDate) as PeriodEnd
            FROM calendar cal
            WHERE  cal.PeriodDate BETWEEN @FromDate AND @ToDate AND cal.Period = @SummaryPeriod
            GROUP BY cal.TImePeriod,cal.MonthValue
            )
            SELECT *
            FROM cte_Calendar
            LEFT OUTER JOIN cte_Rollup  ON cte_Calendar.PeriodStart = cte_Rollup.TxnPeriodStart

Upvotes: 2

Lmu92
Lmu92

Reputation: 952

I'd recommend an indexed month table with just columns: the Start and End date. Then perform a simple left outer join on your TxnRollups table. Adding a clustered index to the month table (StartDate and EndDate) and a noclustered index on TxnRollups on TxnPeriodStart and TxnPeriodEnd would help to speed it up even further.

Upvotes: 0

Related Questions