Túlio Henrique
Túlio Henrique

Reputation: 41

SQL Server group by optimization

On my site I have the query below

SELECT 
    personagem.PersonagemId, personagem.PersonagemTitulo, personagem.PersonagemNome, 
    count(1) AS PersonagemTotal 
FROM 
    Personagens AS personagem 
JOIN 
    ConteudosPersonagens AS conteudoPersonagem ON personagem.PersonagemId = conteudoPersonagem.PersonagemId 
JOIN 
    Conteudos AS conteudo ON conteudoPersonagem.ConteudoId = conteudo.ConteudoId 
JOIN 
    Usuarios AS usuario ON conteudo.UsuarioId = usuario.UsuarioId 
WHERE 
    personagem.CategoriaId = 2595  
    AND conteudo.ConteudoTipo = 6 
    AND conteudo.ConteudoExcluido = 0 
    AND usuario.UsuarioExcluido = 0 
    AND usuario.UsuarioBanido = 0 
GROUP BY 
    personagem.PersonagemId, personagem.PersonagemTitulo, personagem.PersonagemNome 
ORDER BY 
    PersonagemTotal DESC 

Running with SET STATISTICS IO ON command to know when using disk got the result

(6 row(s) affected)
Table 'Usuarios'. Scan count 0, logical reads 223207, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Conteudos'. Scan count 0, logical reads 480356, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ConteudosPersonagens'. Scan count 6, logical reads 394, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Personagens'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Execution Plan: Execution Plan

The tables have the count numbers

The issue and have tried everything and can not decrease the number of reads of "conteudos "and "usuarios" tables, I tried indexed view and other resources.

Thank you

Upvotes: 0

Views: 2426

Answers (2)

Túlio Henrique
Túlio Henrique

Reputation: 41

The problem the wrong number in the estimate was due to a statistics created by auto create statitics

CREATE STATISTICS [UsuarioBanido] ON [dbo].[Usuarios] ([UsuarioBanido]) WITH FULLSCAN

DROP STATISTICS [dbo].[Usuarios].[_WA_Sys_0000001C_7D78A4E7]  

After this Actual Numbers of Rows been corrected

Thanks @martinsmith

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 453057

The estimated number of rows coming out of the JOIN between those tables is way off actual. (156,885 not 37)

enter image description here

as a result you are getting an underestimate of the number of rows and a quite possibly inappropriate nested loops join on the other two tables.

First try

UPDATE STATISTICS dbo.Personagens WITH FULLSCAN;
UPDATE STATISTICS dbo.ConteudosPersonagens WITH FULLSCAN;

Second try

OK you've done that and it raised the estimated rows by a bit (to 202) but it is still way off. You are now getting a hash join on Conteudos but still a nested loops on Usuarios. Probably the PersonagemId values coming from Personagens are much more populous than the average PersonagemId value is in ConteudosPersonagens.

You could add an explicit HASH JOIN hint there but one problem would be that the big under estimate of number of rows on the build input might mean spills.

So you could try this rewrite

SELECT personagem.PersonagemId,
       personagem.PersonagemTitulo,
       personagem.PersonagemNome,
       count(1) AS PersonagemTotal
FROM   Usuarios AS usuario
       INNER HASH JOIN Personagens AS personagem
                       INNER LOOP JOIN ConteudosPersonagens AS conteudoPersonagem
                         ON personagem.PersonagemId = conteudoPersonagem.PersonagemId
                       INNER HASH JOIN Conteudos AS conteudo
                         ON conteudoPersonagem.ConteudoId = conteudo.ConteudoId
         ON conteudo.UsuarioId = usuario.UsuarioId
WHERE  personagem.CategoriaId = 2595
       AND conteudo.ConteudoTipo = 6
       AND conteudo.ConteudoExcluido = 0
       AND usuario.UsuarioExcluido = 0
       AND usuario.UsuarioBanido = 0
GROUP  BY personagem.PersonagemId,
          personagem.PersonagemTitulo,
          personagem.PersonagemNome
ORDER  BY PersonagemTotal DESC 

Edit: But unluckily that gives you an under-estimate too. So the memory grant may well not be sufficient for either of the two hash joins or the sort.

enter image description here

Third Try

Pre-materialise the existing poorly estimated join in a #temp table so SQL Server takes account of the actual number of rows for the remaining two joins.

CREATE TABLE #T (
     PersonagemId     INT,
     PersonagemTitulo VARCHAR(100), /*Made up datatypes. Change as needed*/
     PersonagemNome   VARCHAR(100),
     ConteudoId       INT )

INSERT INTO #T
SELECT personagem.PersonagemId,
       personagem.PersonagemTitulo,
       personagem.PersonagemNome,
       conteudoPersonagem.ConteudoId
FROM   Personagens AS personagem
       JOIN ConteudosPersonagens AS conteudoPersonagem
         ON personagem.PersonagemId = conteudoPersonagem.PersonagemId
WHERE  personagem.CategoriaId = 2595

SELECT #T.PersonagemId,
       #T.PersonagemTitulo,
       #T.PersonagemNome,
       count(1) AS PersonagemTotal
FROM   #T
       JOIN Conteudos AS conteudo
         ON #T.ConteudoId = conteudo.ConteudoId
       JOIN Usuarios AS usuario
         ON conteudo.UsuarioId = usuario.UsuarioId
WHERE  conteudo.ConteudoTipo = 6
       AND conteudo.ConteudoExcluido = 0
       AND usuario.UsuarioExcluido = 0
       AND usuario.UsuarioBanido = 0
GROUP  BY #T.PersonagemId,
          #T.PersonagemTitulo,
          #T.PersonagemNome
ORDER  BY PersonagemTotal DESC 

Upvotes: 1

Related Questions