Reputation: 341
Update: 8/15/2014: See final working solution at the end, based on LMU92's suggestions/sample
I have a query that reads a financial transactions table. The table is a roll-up of detail transactions, is regenerated nightly, and is used for read/SELECT only. Platform is SQL Server 2012.
This sampling is result of of the primary query, which returns historical SUM(Amount) by time period, Account, and Category. The time window reported is parameter driven, for this sample from 1/1/2014 to 5/31/2014:
TimePeriod Start End Category Account Amount
---------- -------- --------- ------------ ---------------- ------------
month 1/1/2014 1/31/2014 CategoryX AccountA 2421.00
month 4/1/2014 4/30/2014 CategoryX AccountA 1421.00
month 5/1/2014 5/31/2014 CategoryY AccountA 9421.00
month 1/1/2014 1/31/2014 CategoryZ AccountB 2421.00
month 3/1/2014 3/31/2014 CategoryZ AccountB 6421.00
...
The result I'm after would eliminate the gaps by filling-down any gap (transaction-less) periods with 0.00 amounts, for example, for Month/AccountA/CategoryX:
TimePeriod Start End Category Account Amount
---------- -------- --------- ------------ ---------------- ------------
month 1/1/2014 1/31/2014 CategoryX AccountA 2421.00
month 2/1/2014 2/28/2014 CategoryX AccountA 0.00
month 3/1/2014 3/31/2014 CategoryX AccountA 0.00
month 4/1/2014 4/30/2014 CategoryX AccountA 1421.00
month 5/1/2014 5/31/2014 CategoryX AccountA 0.00
The challenge is that the rollup is by multiple period types (day/week/month/quarter/year), each of which can be broken down by Account/Category. The overall pool of records is 10 million for all periods combined, and with numbers increasing as the periods become more fractional (e.g. weeks/days).
I've tried a CTE which performed poorly (despite index tweaking, it appeared to process heavily), also attempted adding 0.00 records which as exponentially increased the number of records in the pool, since it is one record per transaction-less period (d/w/m/q/y), per account (commencing with the account's first transaction date), per category, it took lots of index tweaking to get such a large pool to give us acceptable/near acceptable performance, and has also increased the time it takes to perform our nightly load. I looked into doing a cube, but that appeared to be overkill for what we are doing.
The solution I'm looking for would do this on the fly, and without calendar table. I do have a calendar dimension table if that is the only efficient way to do this on the fly).
Any suggestions are very much appreciated.
A simplified version of the DDL/(T-)SQL:
Table:
CREATE TABLE TxnRollups (
TxnTimePeriod VARCHAR(10), --Year/Quarter/Month/Week/Day
TxnPeriodStartDate DATE,
TxnPeriodEndDate DATE,
TxnAccountID VARCHAR(10),
TxnAccountType VARCHAR(20),
TxnAccountName VARCHAR(20),
TxnAccountHierL1 VARCHAR(20),
TxnAccountHierL2 VARCHAR(20),
TxnAccountHierL3 VARCHAR(20),
TxnCategory VARCHAR(20),
Amount DECIMAL(16,3)
)
Query:
CREATE PROCEDURE GetTxnByPeriod(@FromDate DATE, @ToDate DATE, @SummaryPeriod VARCHAR(20))
AS
BEGIN
SELECT TxnR.TxnTimePeriod TimePeriod,
TxnR.TxnPeriodStart Start,
TxnR.TxnPeriodEnd End,
TxnR.TxnCategory Category,
TxnR.TxnAccountName Account,
SUM(TxnRAmount) Amount
From TxnRollups TxnR
WHERE
TxnR.TxnTImePeriod = @SummaryPeriod AND
TxnR.TxnPeriodEnd BETWEEN @FromDate AND @ToDate
GROUP BY
TxnR.TxnTimePeriod,
TxnR.TxnPeriodStart,
TxnR.TxnPeriodEnd,
TxnR.TxnCategory,
TxnR.TxnAccountName
END
Update: 8/15/2014: Final working solution
Overview
With the following approach, I was able to produce the desired result-set, and also eliminate the need of using the a roll-up, instead querying the raw data-set directly. Most importantly:
Execution time: 1k-1.5k ms depending on the amount of data returned. We did not even get get this performance out of our roll-up.
Structure:
** this is an abbreviated version of the code (some debug, non-essential's removed) **
CREATE PROCEDURE GetTransactionsByMonth
(
@FromDate DATE = NULL ,
@ToDate DATE = NULL ,
@TxnCategory VARCHAR(7) = 'CAT1' ,
@AccountType1 VARCHAR(21) = NULL ,
@AccountType2 VARCHAR(21) = NULL ,
@AccountType3 VARCHAR(21) = NULL ,
@Debug BIT = 0
)
WITH RECOMPILE
AS
BEGIN
DECLARE @True AS BIT = 1
DECLARE @False AS BIT = 0
PRINT IIF(@Debug = @True, 'START Procedure - ' + CAST(SYSDATETIMEOFFSET() AS VARCHAR),NULL)
IF @Debug = @True
BEGIN
SET STATISTICS TIME ON
SET STATISTICS IO ON
DECLARE @NoCountState int = @@OPTIONS & 512;
SET NOCOUNT OFF;
END
/*================================================================================================================
Initialization - Declarations
================================================================================================================*/
DECLARE @MinDateN FLOAT
DECLARE @MaxDateN FLOAT
DECLARE @MinDateD DATE
DECLARE @MaxDateD DATE
/*================================================================================================================
Initialization - Establish Date Ranges
================================================================================================================*/
SET @FromDate = DATEADD(MM, DATEDIFF(MM, 0, @FromDate), 0) --Set @FromDate to first of the requested month
SET @MinDateN = FLOOR(CAST(CAST(@FromDate as DateTime) as float))
SET @ToDate = DATEADD(MM, DATEDIFF(MM, 0, @ToDate), 0) --Set @ToDate to first of the requested month
SET @MaxDateN = FLOOR(CAST(CAST(@ToDate as DateTime) as float))
SET @MinDateD = CAST(FLOOR(CAST(@MinDateN AS FLOAT)) AS DATETIME) --For output only
SET @MaxDateD = CAST(FLOOR(CAST(@MaxDateN AS FLOAT)) AS DATETIME) --For output only
; WITH CTE_MonthsRollUp AS (
SELECT DISTINCT
MonthsTable.NFirstDayOfMonth AS PeriodStartN ,
MonthsTable.NLastDayOfMonth AS PeriodEndN ,
MonthsTable.FirstDayOfMonth AS PeriodStartD ,
MonthsTable.LastDayOfMonth AS PeriodEndD ,
TransactionDetail.AccountId AS AccountId ,
TransactionDetail.AcctCategory AS AcctCategory
FROM dbo.Months AS MonthsTable
CROSS JOIN dbo.tblTxnDetail AS TransactionDetail
WHERE TransactionDetail.AccountId IS NOT NULL
AND TransactionDetail.AcctCategory = @AcctCategory
AND TransactionDetail.AccountType IN (
@AccountType1 ,
@AccountType2 ,
@AccountType3 ,
)
AND MonthsTable.NFirstDayOfMonth <= @MaxDateN
AND MonthsTable.NLastDayOfMonth >= @MinDateN
AND MonthsTable.NLastDayOfMonth >= (
SELECT MIN(NFirstDayOfMonth) AS EarliestTxnDateN
FROM tblTxnDetail AS ValidateAccount
WHERE ValidateAccount.AccountId = TransactionDetail.AccountId
)
GROUP BY
MonthsTable.NFirstDayOfMonth ,
MonthsTable.NLastDayOfMonth ,
MonthsTable.FirstDayOfMonth ,
MonthsTable.LastDayOfMonth ,
TransactionDetail.AccountId ,
TransactionDetail.AcctCategory
)
SELECT MonthsRollupResults.*
INTO #TMonthsRollup
FROM CTE_MonthsRollUp MonthsRollupResults
OPTION (RECOMPILE)
;
CREATE NONCLUSTERED INDEX [#idxTMonthsRollup_AccountIDandTxnCategory_Join]
ON [dbo].[#TMonthsRollup] ([AccountId],[AcctCategory], PeriodStartN)
INCLUDE (PeriodStartD, PeriodEndD)
;
; WITH CTE_AccountList AS (
SELECT DISTINCT DistinctAccountList.NFirstDayOfMonth AS PeriodStartN ,
DistinctAccountList.NLastDayOfMonth AS PeriodEndN ,
DistinctAccountList.AccountId AS AccountId ,
DistinctAccountList.AcctCategory AS AcctCategory ,
DistinctAccountList.AccountType AS AccountType ,
DistinctAccountList.Account AS AccountName ,
DistinctAccountList.ACCOUNTL1 AS AccountHierarchyL1 ,
DistinctAccountList.ACCOUNTL2 AS AccountHierarchyL2 ,
DistinctAccountList.ACCOUNTL3 AS AccountHierarchyL3 ,
SUM(DistinctAccountList.Amount) AS PeriodAmount
FROM tblTxnDetail AS DistinctAccountList
WHERE DistinctAccountList.NFirstDayOfMonth <= @MaxDateN
AND DistinctAccountList.NLastDayOfMonth >= @MinDateN
AND DistinctAccountList.AccountId IS NOT NULL
AND DistinctAccountList.AcctCategory = @AcctCategory
AND DistinctAccountList.AccountType IN (
@AccountType1 ,
@AccountType2 ,
@AccountType3 ,
)
GROUP BY DistinctAccountList.NFirstDayOfMonth ,
DistinctAccountList.NLastDayOfMonth ,
DistinctAccountList.AccountId ,
DistinctAccountList.AcctCategory ,
DistinctAccountList.AccountType ,
DistinctAccountList.Account ,
DistinctAccountList.ACCOUNTL1 ,
DistinctAccountList.ACCOUNTL2 ,
DistinctAccountList.ACCOUNTL3
)
SELECT DistinctAccountList.*
INTO #TAccountList
FROM CTE_AccountList DistinctAccountList
;
CREATE NONCLUSTERED INDEX [#idxTAccountList_DistincAccountDetailsList_For_Join]
ON [dbo].[#TAccountList] ([AccountId],[AcctCategory], PeriodStartN)
INCLUDE (AccountName, PeriodCredit, PeriodDebit, PeriodAmount, AccountType,
AccountHierarchyL1,AccountHierarchyL2,AccountHierarchyL3,AccountHierarchyL4,
AccountHierarchyL5, BSReportHierarchyL1, BSReportHierarchyL2, BSReportHierarchyL3,
BSReportHierarchyL4, PLReportHierarchyL1, PLReportHierarchyL2, PLReportHierarchyL3
)
;
SELECT DISTINCT 'Month' AS Period ,
@MinDateD AS ReportStart ,
@MaxDateD AS ReportEnd ,
tMonthRollup.AccountId AS AccountId ,
tMonthRollup.AcctCategory AS AcctCategory ,
tMonthRollup.PeriodStartD AS PeriodStart ,
tMonthRollup.PeriodEndD AS PeriodEnd ,
AccountList.AccountName AS AccountName ,
AccountList.PeriodAmount AS PeriodAmount ,
AccountList.AccountType AS AccountType ,
AccountList.AccountHierarchyL1 AS AccountHierarchyL1 ,
AccountList.AccountHierarchyL2 AS AccountHierarchyL2 ,
AccountList.AccountHierarchyL3 AS AccountHierarchyL3
FROM #TAccountList AS AccountList
RIGHT OUTER JOIN #TMonthsRollup AS tMonthRollup
ON AccountList.AccountId = tMonthRollup.AccountId
AND Accountlist.AcctCategory = tMonthRollup.AcctCategory
AND AccountList.PeriodStartN = tMonthRollup.PeriodStartN
--------------------------------------------------------
PRINT IIF(@Debug = @True, 'END PROCEDURE - ' + CAST(SYSDATETIMEOFFSET() AS VARCHAR),NULL)
--------------------------------------------------------
IF @Debug = @True
BEGIN
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
IF @NoCountState <> 0
SET NOCOUNT ON
END
END
Upvotes: 2
Views: 2919
Reputation: 952
Here's a coded version how to perform the rollup and joini it against a calendar table:
WITH cte_Rollup as
(SELECT TxnR.TxnTimePeriod TimePeriod,
TxnR.TxnPeriodStart Start,
TxnR.TxnPeriodEnd End,
TxnR.TxnCategory Category,
TxnR.TxnAccountName Account,
SUM(TxnRAmount) Amount
From TxnRollups TxnR
WHERE
TxnR.TxnTImePeriod = @SummaryPeriod AND
TxnR.TxnPeriodEnd BETWEEN @FromDate AND @ToDate
GROUP BY
TxnR.TxnTimePeriod,
TxnR.TxnPeriodStart,
TxnR.TxnPeriodEnd,
TxnR.TxnCategory,
TxnR.TxnAccountName
), cte_Calendar AS
(
SELECT cal.Period, Min(cal.PeriodDate) as PeriodStart, Max(cal.PeriodDate) as PeriodEnd
FROM calendar cal
WHERE cal.PeriodDate BETWEEN @FromDate AND @ToDate AND cal.Period = @SummaryPeriod
GROUP BY cal.TImePeriod,cal.MonthValue
)
SELECT *
FROM cte_Calendar
LEFT OUTER JOIN cte_Rollup ON cte_Calendar.PeriodStart = cte_Rollup.TxnPeriodStart
Upvotes: 2
Reputation: 952
I'd recommend an indexed month table with just columns: the Start and End date. Then perform a simple left outer join on your TxnRollups table. Adding a clustered index to the month table (StartDate and EndDate) and a noclustered index on TxnRollups on TxnPeriodStart and TxnPeriodEnd would help to speed it up even further.
Upvotes: 0