J.V.
J.V.

Reputation: 5

SQL Performance Issues running Query with CTE and CrossJoin

So, the original task was to create a report that identified the number of concurrent transactions for a given 5 min interval and provide a high-water mark week by week of the most concurrent transactoins. I've solved the problem but the query tends to run around 3-3.5hrs for 3 months of data. This is not really what I was hoping for.

For the purposes of this exercise, there are only two Columns I care about:

Transaction_Data.DateTime (Start of Transaction: e.g. 2018-01-01 23:59:59.999)
Transaction_Data.Duration (Integer seconds: e.g. 272)

The tricky part of this query is that I need to take DateTime + Duration and break it up into 5 min intervals. So, if the timestamp is 10:02:00 and the Duration is 715 seconds (ending at 10:13:55), then I need to count the transaction for the 10:00, 10:05, and 10:10 intervals.

To accomplish this, I use a CTE to populate a temporary table that has all the intervals for the range of time in question and then use a Cross Join query to place the transactions into their various intervals.

Here is the query:

DECLARE @Times Table (DateTime DateTime)
DECLARE @StartDate AS DATETIME
,       @EndDate AS DATETIME
SET     @StartDate = '2018-06-01'
SET     @EndDate = '2018-08-31 23:59:59.999';

WITH DateIntervalsCTE AS
(
 SELECT 0 i, @StartDate AS Date
 UNION ALL
 SELECT i + 5, DATEADD(minute, i, @StartDate )
 FROM DateIntervalsCTE 
 WHERE DATEADD(minute, i, @StartDate ) < @EndDate
)
INSERT INTO @Times (DateTime)
SELECT DISTINCT Date 
FROM DateIntervalsCTE
OPTION(MAXRECURSION 32767);

select  Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101) as 'WeekOf'
,       Max(Count) as 'MaxConcur'
from
    (select t.DateTime
    ,       count(t.DateTime) as 'Count'
    from    Transaction_Detail TD
    cross join @Times t
    where   t.DateTime 
            between         
            DateAdd(ss,-(((DatePart(mi,TD.DateTime)%5)*60)+DatePart(ss,TD.DateTime)),DateAdd(ms,-DatePart(ms,TD.DateTime),TD.DateTime))
            and DateAdd(ss,TD.Duration,TD.DateTime)
    group by 
            t.DateTime) as IntData
group by
        Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101)

With respect to my DateAdd stuff in the Where clause, I'm trying to round the start time down to the nearest 5 minute interval so that the CrossJoin will match for the starting interval.

With respect to the Data, I can't change anything about the structure as the Data is generated by Third Party application. In general, I consider the whole SQL Server to be Read-Only so I generally avoid creating any kind of static tables on it. Since I support a lot of these DBs for various clients, the ideal here is that the code can simply be pasted into a SSMS window and executed.

With respect to the performance issues, the CTE part itself runs fast enough, but the way the query runs I expect that comparing 26k (3-months) of 5-min intervals to 325k transaction records is the true root of my problem. What's 8.5 billion operations between friends, right?

In the interest of full disclosure, while I write a lot of t-sql and have for a lot of years, this is the first time I've made use of CTE and CrossJoins. It is entirely possible I screwed something up and haven't been able to detect it, but from what diagnostic stuff I've done, it appears to be reporting accurately, albeit slooowly.

What I'm hoping for with this request is for someone with more T-SQL knowledge then me to point out a better way to accomplish what I'm trying to accomplish that runs in terms of minutes rather than hours. While I wouldn't turn my nose up at a rewritten solution, I would be quite happy to simply be pointed in the direction of a better technique.

If you've read this far, thank you for your time.

-J.V.

Sample Input

DateTime    Duration
2018-06-01 00:04:55.223 57
2018-06-01 00:04:56.223 58
2018-06-01 00:08:37.180 62
2018-06-01 00:08:37.180 62
2018-06-01 00:20:29.183 10
2018-06-01 00:28:38.423 0
2018-06-01 00:28:53.190 15
2018-06-01 00:31:52.690 195
2018-06-01 00:32:20.917 209
2018-06-01 00:32:54.690 756

Note: This is a very tiny sample of what the input looks like.

Sample Output

WeekOf      MaxConcur
05/27/2018  101
06/03/2018  169
06/10/2018  189
06/17/2018  148
06/24/2018  186
07/01/2018  218
07/08/2018  222
07/15/2018  210
07/22/2018  219
07/29/2018  225
08/05/2018  243
08/12/2018  231
08/19/2018  253
08/26/2018  220

Final Solution

Firstly, thanks to everyone who replied. This was really awesome for me and I learned some interesting ideas on how to solve SQL problems. In particular thanks to KumarHarsh for bringing me close enough to get to a final resolution that brings up the data in 15s, which is far faster than I hoped for. Here is the final query (apologies if I'm not doing this right but I felt the final answer needed to be shared):

 -- Set Up Time Range
declare     @minDate DateTime='2018-06-01 00:00:00.000'
declare     @maxDate DateTime='2018-08-31 23:59:59.999' 

-- Build Temporary Interval Table
create      table #TimesTable (
    [DateTime] DateTime not null 
,   [DateCol] Date not null
)

-- Populate Interval Table (5min Intervals)
insert      into #TimesTable
select      dateadd(minute,(RowNum-1)*5,@minDate) as 'DateTime'
,           cast(dateadd(minute,(RowNum-1)*5,@minDate) as Date) as 'Date'
from (
    select      ROW_NUMBER()over(order by (select null)) as 'RowNum'
    from        master..spt_values a
    CROSS JOIN  master..spt_values b
) as TT
where       cast(dateadd(minute,RowNum*5,@minDate) as DateTime) < @maxDate

-- Build Table Indexes
create      clustered index ix_datecol on #TimesTable ([DateTime],[DateCol])

-- Query the Data
select      Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101) as 'WeekOf'
,           Max(Count) as 'MaxConcur'
from (
    select      TData.DateTime
    ,           TData.[DateCol]
    ,           Count(TData.DateTime) as 'Count'
    from        dbo.Transaction_Detail TD
    outer apply (
        select      TT.DateTime
        ,           TT.[DateCol]
        from        #TimesTable as TT
        where       TT.DateTime 
                    between         
                    TD.DateTime and DateAdd(ss,TD.Duration,TD.DateTime)
    ) as TData
    group by    TData.DateTime,TData.[DateCol] 
) as IntData
where       Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101) is not null
group by    Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101)
order by    'WeekOf'

drop table  #TimesTable

From Kumar's example I had to make a few changes:

  1. As I stipulated originally I didn't want to make permanent changes to the DB, so instead I switched his example to a Temp Local table and that seemed to be sufficient. I drop the table after everything is done so that I can rerun at different intervals and the query has already cleaned up after itself.
  2. I really liked the Cartesian Product idea since it doesn't have the recursion limit of the CTE approach. I did have to limit the Cartesian Product though because I wanted this to be compact and I was going to drop the temp table anyway.
  3. I got rid of one value in the Interval Table generation that didn't seem to be doing anything.
  4. I renamed some things. I found the reuse of the same table alias to make it more difficult to understand the mechanics.
  5. I filtered out a NULL row from the result set that was showing up for some reason.

Upvotes: 0

Views: 632

Answers (2)

KumarHarsh
KumarHarsh

Reputation: 5094

Regarding your script,

  1. Why use distinct in CTE,remove it or your CTE is bad

  2. Do not use table variable,use temp table

  3. Group By "Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101)"

Group by can be replace with ,notice extra () around , (DatePart(weekday,getdate())+1)

select DateAdd(day,-(DatePart(weekday,getdate())+1),Convert(varchar(10),getdate(),101))

select DateAdd(day,-(DatePart(weekday,getdate())+1),cast(getdate() as date))

I am not sure but it semm that this part is long and wrong.

Do only this change and check.

My way,

First create and time table table. This is One time time table creation.

You can create in whichever way you want

declare @minDate Datetime='2005-01-01'
create table TimesTable ([DateTime] DateTime not null ,[DateCol] Date not null)

insert into TimesTable
select dateadd(minute,rn*5,@minDate),cast(dateadd(minute,rn*5,@minDate) as date)
from
(
select a.number, ROW_NUMBER()over(order by (select null))rn 
from master..spt_values a
CROSS JOIN master..spt_values b
)t4

Create clustered index ix_datecol on TimesTable ([DateTime],[DateCol])

-- where '2005-01-01' is any min value choose min value according to your requirement

Index can be change if it is not working and script showing some improvement.

DECLARE @StartDate AS DATETIME
,       @EndDate AS DATETIME
SET     @StartDate = '2018-06-01'
SET     @EndDate = '2018-08-31 23:59:59.999';



select  Convert(varchar(10),DateAdd(day,-DatePart(weekday,IntData.DateTime)+1,Convert(varchar(10),IntData.DateTime,101)),101) as 'WeekOf'
,       Max(Count) as 'MaxConcur'
from
    (select t.DateTime,t.[DateCol]
    ,       count(t.DateTime) as 'Count'
    from    dbo.Transaction_Detail TD
    outer apply(select t.DateTime,t.[DateCol] from TimesTable t
    where   t.DateTime 
            between         
            DateAdd(ss,-(((DatePart(mi,TD.DateTime)%5)*60)+DatePart(ss,TD.DateTime)),DateAdd(ms,-DatePart(ms,TD.DateTime),TD.DateTime))
            and DateAdd(ss,TD.Duration,TD.DateTime)
            )t
    group by 
            t.DateTime,t.[DateCol] ) as IntData
group by
        t.[DateCol]

    -- In place of cross join ,try OUTER APPLY once

My script will throw error or give incorrect output in the beginning.But I am sure it can be corrected.

Understand the idea behind t.[DateCol] and tune the query accordingly.

let me know the performance if it is working.

Upvotes: 1

Alex
Alex

Reputation: 5157

Approach:

  1. Try to pre-compute everything in your CTE. e.g. [WeekOf] value select Convert(varchar(10),DateAdd... should come from CTE/table variable

  2. Avoid/minimise computations in GROUP BY and WHERE e.g where t.DateTime between DateAdd .... should be just a simple range condition.

  3. You don't need a CROSS JOIN

Result:

DECLARE @Times Table (IntervalStart DateTime, IntervalEnd DateTime, [WeekOf] DATETIME)
DECLARE @StartDate AS DATETIME
,       @EndDate AS DATETIME
SET     @StartDate = '2018-06-01'
SET     @EndDate = '2018-08-31 23:59:59.999';

WITH DateIntervalsCTE AS
(
 SELECT 0 i, @StartDate AS Date
 UNION ALL
 SELECT i + 5, DATEADD(minute, i, @StartDate )
 FROM DateIntervalsCTE 
 WHERE DATEADD(minute, i, @StartDate ) < @EndDate
)
INSERT INTO @Times (IntervalStart, IntervalEnd, [WeekOf])
SELECT Date, DATEADD(minute, 5, Date ), Convert(varchar(10),DateAdd(day,-DatePart(weekday,Date)+1,Convert(varchar(10),Date,101)),101)
FROM DateIntervalsCTE
OPTION(MAXRECURSION 32767);

SELECT [WeekOf], MAX( [Count] ) AS 'MaxConcur'
FROM(
    SELECT t.IntervalStart, COUNT(t.IntervalStart) AS [Count], [WeekOf]
    FROM Transaction_Detail AS TD
    INNER join @Times AS t ON t.IntervalStart <= TD.DateTime  AND DATEADD( ss, TD.Duration, TD.DateTime ) < t.IntervalEnd
    GROUP BY [WeekOf], t.IntervalStart ) AS IntData
GROUP BY [WeekOf]

Explanation:

I have added PeriodEnd column to simplify join condition (see point 2.)

I have added WeekOf column computation to CTE (see point 1.)

Possible improvements

If there is an index on Transaction_Data.DateTime you can try adding a WHERE clause in the nested SELECT e.g. WHERE @StartDate <= TD.DateTime AND TD.DateTime <= @EndDate to reduce the number of transaction records searched.

Conclusion

In my experience this query should not take more than 20 min with reasonable hardware.

Try smaller range of @StartDate and @EndDate to test

Upvotes: 0

Related Questions