broadband
broadband

Reputation: 3488

How to make group by query faster?

I have next table with nearly 2 millions records and of course increasing everyday. Some table records (below columnId are foreign keys to parent tables e.g. DirectionId --> Direction table, ...):

Id         TypeId   DirectionId UserId  IndicatorId Date                                     Size      ExternalId
2003    100        1              1          1              2015-06-01 00:02:23.0000000 11931   28657340
2004    2           1               2          1             2015-06-01 00:03:21.0000000 10358   28657341
2005    2           2               2          1             2015-06-01 00:03:31.0000000 10848   28657342
2006    100        1              2          1             2015-06-01 00:03:52.0000000  7860    28657343
2007    100        1              3          1             2015-06-01 00:03:59.0000000  13353   28657344

I need to get datetime of last message TypeId and DirectionId. Query below returns what I need

select TypeId, DirectionID, max(date) as Date
from message
group by TypeId, DirectionID;

DirectionId TypeId  Date
2               1         2015-06-05 15:12:37.0000000
1               1         2015-06-05 15:12:39.0000000

The problem is that this query takes from 2500 ms to 3000 ms to execute. I added index:

CREATE NONCLUSTERED INDEX [date_index] ON [mqview].[Message] ([Date] ASC)
INCLUDE ([Id],  [TypeId],[DirectionId], [UserId], [Size], [ExternalId]) WITH (PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

What can be done to get results faster?

Update

With proposed added index, I get results faster, but now I would like to get faster results with two inner joins as described above. Or eventually I could do additional 2 queries from tables MessageDirection and MessageType if nothing can be done to increase performance of below query.

SET STATISTICS TIME ON
select  mt.Code, md.Code, max(m.date) as Date
from
  mqview.Message m
  inner join mqview.MessageDirection md on (md.Id = m.DirectionId)
  inner join mqview.MessageType mt on (mt.Id = m.TypeId)
group by mt.Code, md.Code
SET STATISTICS TIME OFF

Message:

 SQL Server Execution Times:
 CPU time = 3343 ms,  elapsed time = 2817 ms.

Execution plan: enter image description here

Upvotes: 9

Views: 23259

Answers (2)

Devart
Devart

Reputation: 121912

IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
    DROP TABLE #temp
GO

CREATE TABLE #temp
(
    Id INT PRIMARY KEY,
    TypeId TINYINT,
    DirectionId TINYINT,
    UserId TINYINT,
    IndicatorId TINYINT,
    [Date] DATETIME2
)

CREATE /*UNIQUE*/ NONCLUSTERED INDEX ix ON #temp (TypeId, DirectionId, [Date] DESC) -- DESC
GO

INSERT INTO #temp (Id, TypeId, DirectionId, UserId, IndicatorId, [Date])
VALUES
    (2003, 100, 1, 1, 1, '20150601 00:02:23.0000000'),
    (2004, 2  , 1, 2, 1, '20150601 00:03:21.0000000'),
    (2005, 2  , 2, 2, 1, '20150601 00:03:31.0000000'),
    (2006, 100, 1, 2, 1, '20150601 00:03:52.0000000'),
    (2007, 100, 1, 3, 1, '20150601 00:03:59.0000000')


SELECT TypeId, DirectionID, MAX([Date])
FROM #temp
GROUP BY TypeId, DirectionId

update:

SELECT mt.Code, md.Code, t.[Date]
FROM (
    SELECT TypeId, DirectionID, [Date] = MAX([Date])
    FROM mqview.[Message]
    GROUP BY TypeId, DirectionId
) t
JOIN mqview.MessageDirection md on md.Id = t.DirectionId
JOIN mqview.MessageType mt on mt.Id = t.TypeId

Upvotes: 4

Richard Irons
Richard Irons

Reputation: 1473

Your index isn't helpful for the query. You are grouping by TypeId first, whereas your index has the rows ordered by Id first. So to group by TypeId and then DirectionId, the query still has to scan every single row in the table. Then, once it has grouped by these values, it has to look at every row in each group to find the maximum date.

If you had the rows indexed by TypeId, and then by DirectionId, then the grouping would be quicker because the rows would naturally be in order of their grouping within the index. If you then add Date to the index, then the query will know that the last row in each group will be the highest date, which will speed it up a little, but if you make the Date sort in the index descending, then the first row in each group will have the highest date. This means that only the first row in each group needs to be looked at. This will give a great speed boost - you will probably find that with this index your query becomes near-instant.

Because the index now contains all the values in your query, the actual rows of the table will not even need to be visited. The database engine can return the values straight from the index. This removes another step from the query processing and makes it faster again.

Your CREATE INDEX statement would look like this:

CREATE INDEX ix_myNewIndex ON [mqview].[Message] (TypeId, DirectionId, [Date] DESC) 

Upvotes: 5

Related Questions