Reputation: 560
This part of the query makes it quite bad, unfortunately can't see a way around, only optimize.
update #db set contents = i.contents
from (select distinct
(select max(ac.contents) from ##dwv d
left join ##calendar c on 1=1
left join #db ac on d.id = ac.id
and c.ReportingPeriod = ac.DateValue and ac.Data_Type = 'ActivePeriod'
where d.ID = dd.id and month_number >= (cc.month_number-3)
and month_number <= cc.month_number) contents
,dd.id
,cc.ReportingPeriod
from #db dd
left join ##calendar cc on cc.ReportingPeriod = dd.DateValue
where dd.Data_Type = 'ActivePeriod'
)i
where i.id = #db.id and i.ReportingPeriod = #dashboard.DateValue
I was trying to merge it first, but wasn't going somewhere fast, and the above puppy came to be.
The Idea is to mark every customer as active in any given period (year, month in format 'YYYYMM') according to a specific algorithm, so for every customer that matches the report criteria I need to have a row which will tell me if he was active (that is: bought something recently).
#db
is a temp table where I'm gathering all the data that will be later used for aggregates to produce report - large table of several million rows, depending on timeframe:
Create table #db
(
C_P varchar(6)
,Data_Type varchar(20)
,id int
,contents int
,DateValue varchar(10)
)
##dwv
is a temp table where I'm dumping the result of a select on a large view (which itself is very slow), holds about 2.4 million rows
##calendar
is an ad-hoc table which stores every period the report encompasses in same format 'YYYYMM':
select CONVERT(char(6), cast(@startdate as date), 112) "CP"
,CONVERT(char(6), cast(PKDate as date), 112) "RP"
,(ROW_NUMBER() over (order by (CONVERT(char(6), cast(PKDate as date), 112)) asc))-1
as month_number
into ##calendar
from [calendar].[dbo].[days]
where PKDate between @startdate and @enddate2
group by CONVERT(char(6), cast(PKDate as date), 112)
Query plan tells me that the bit c.ReportingPeriod = ac.DateValue
is the cuplrit - takes 88% of the subquery cost with it, which in turns accounts for 87% of the cost of whole query.
What am I not seeing here and how can I improve that?
Upvotes: 0
Views: 1531
Reputation: 560
Just in case if someone stumbles here I'll explain what I did to trim down execution time from 32 minutes to 15 seconds.
One, as suggested in comments and answer by Tab Alleman I've looked at the indexes for the tables where HASH JOIN
showed on the execution plan. I've also had a closer look at ON
clauses for the joins here and there refining them, which ended with smaller numbers of rows in results. To be more specific - the inline query fetching the 'contents' value for update now is against source table '#dwv' that joins to a preprocessed '#calendar' table, as opposed to a cross join between two tables and then another join to the result. This reduced the end dataset to bare hundreds of thousands of rows instead of 17 billion, as reported in the query plan.
Effect is that now the report is lightning quick compared to previous drafts, so much so that it now can be run in a loop and it still outputs in more than reasonable time.
Bottom line is that one has to pay attention to what SQL Server complains about, but also at least have a look at the number of rows crunched and try to lower them whenever possible. Indexing is good, but it's not "the miracle cure" for all that ails your query.
Thanks to all who took time to write here - when several people say similar things it's always good to sit down and think about it.
Upvotes: 1
Reputation: 31785
Hash Joins usually mean that the columns used in the JOIN are not indexed.
Make sure you have covering indexes for these columns:
d.id = ac.id and c.ReportingPeriod = ac.DateValue and ac.Data_Type
Upvotes: 1