Reputation:
I have a table with 1.3 million rows.
Query #1 takes 29 seconds to run in SQL Server 2016 Management Studio.
Query #1:
select
*,
(select Count(*)
from [dbo].[Results] t2
where t2.RaceDate < t1.RaceDate
and t1.HorseName = t2.HorseName
and t2.Position = '1'
and t1.CourseName = t2.CourseName
and t2.CountryCode = 'GB') as [CourseDistanceWinners]
from
[dbo].[Results] t1
But query #2 takes takes several hours with the only difference being t1.HorseName = t2.HorseName
vs t1.TrainerName = t2.TrainerName
. There will be many more matches but on TrainerName than HorseName but I wasn't expecting several hours.
Query #2:
select
*,
(select Count(*)
from [dbo].[Results] t2
where t2.RaceDate < t1.RaceDate
and t1.TrainerName = t2.TrainerName
and t2.Position = '1'
and t1.CourseName = t2.CourseName
and t2.CountryCode = 'GB') as [CourseDistanceWinners]
from
[dbo].[Results] t1
I've managed to get the query down to 15 minutes using the techniques below but I still think this is a very long time. Is there anything else I can do to improve performance of Query2 or a way to rewrite it for performance?
What I have tried so far
I've changed [TrainerName] [nvarchar](255) NULL,
to [TrainerName] [nvarchar](50) NULL,
I've added a composite index and several non clustered indexes
CREATE INDEX idx_HorseName
ON [dbo].[Results] (HorseName);
CREATE INDEX idx_TrainerName
ON [dbo].[Results] (TrainerName);
CREATE INDEX idx_CourseName
ON [dbo].[Results] (CourseName);
CREATE INDEX idx_Position
ON [dbo].[Results] (Position);
CREATE INDEX idx_JockeyName
ON [dbo].[Results] (JockeyName);
CREATE INDEX idx_RaceDate
ON [dbo].[Results] (RaceDate);
CREATE INDEX idx_TrainerComposite
ON [dbo].[Results] (TrainerName, RaceDate, CourseName);
Further info:
Table structure:
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Results]
(
[CountryCode] [NVARCHAR](50) NULL,
[CourseName] [NVARCHAR](50) NULL,
[HorseName] [NVARCHAR](50) NOT NULL,
[HorseSuffix] [NVARCHAR](5) NOT NULL,
[JockeyName] [NVARCHAR](255) NULL,
[OwnerName] [NVARCHAR](255) NULL,
[Position] [NVARCHAR](255) NULL,
[PublishedTime] [NVARCHAR](6) NOT NULL,
[RaceDate] [DATETIME] NOT NULL,
[RaceTitle] [NVARCHAR](255) NULL,
[StallPosition] [NVARCHAR](255) NULL,
[TrainerName] [NVARCHAR](50) NULL,
[Rating] [INT] NULL,
CONSTRAINT [PK_Results_1]
PRIMARY KEY CLUSTERED ([HorseName] ASC,
[HorseSuffix] ASC,
[PublishedTime] ASC,
[RaceDate] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Query #1 execution plan:
Query #2 execution plan:
Upvotes: 0
Views: 249
Reputation: 1269773
Use a window function!
select r.*,
sum(case when position = 1 and country_code = 'GB' then 1 else 0 end) over
(partition by horsename, coursename
order by racedate
rows between unbounded preceding and 1 preceding
) as CourseDistanceWinners
from [dbo].[Results] r
Upvotes: 1