user3491649
user3491649

Reputation: 31

How to speed up queries on non indexed columns SQL

I have a table of error logs with around 300 million rows. There is an index on the Date column but I am trying to query by both date and error message. When I query by date it is fast but I need to query by message as well which slows it down.

My query is as follows

WITH data_cte(errorhour, message) 
     AS (SELECT Datepart(hh, date) AS ErrorDay, 
                message 
         FROM   cloud.errorlog 
         WHERE  date <= '2016-06-02' 
           AND  date >= '2016-06-01') 
SELECT errorhour, 
       Count(*) AS count, 
       message 
FROM   data_cte 
WHERE  message = 'error connecting to the server' 
GROUP  BY errorhour 
ORDER  BY errorhour 

adding the where clause slows it down because Message is not indexed. How can I speed it up?

EDIT: I cannot index on Message because it is defined as varchar(max).

Upvotes: 2

Views: 3433

Answers (4)

Ross Presser
Ross Presser

Reputation: 6255

If you will ALWAYS be searching for the text 'error connecting to the server' then you can use a filtered index:

CREATE INDEX ix_ectts ON ErrorLog (Date) 
   WHERE (Date between '2016-06-01' and '2016-06-02')
     AND Message='error connecting to the server';

This index should be fairly small in bytes consumed, and quick to consult. It may be fairly slow to update however; consider creating it every time you need to run this query and dropping it afterward.

Another choice is to use a computed column on the first few hundred characters of Message, and index on that:

ALTER TABLE ErrorLog
   ADD Message_index AS (cast (Message as varchar(400)));

CREATE INDEX theIndex ON ErrorLog (Message_index, [date]);

EDIT: added missing parentheses after cast

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269753

You can simplify the query to:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);

Then for this query, you want an index on errorlog(message, date). It is important that the message be first in the index, because of the equality comparison.

EDIT:

If the message is too long and you want queries like this, I would recommend adding a computed column and use that for the index and where clause:

alter table errlog add message250 as (left(message, 250));

create index idx_errlog_message250_date on (message250, date);

And then write the query as:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message250 = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);

Upvotes: 0

John D
John D

Reputation: 1637

If it is possible to extract a short summary of the error message, you could then include that in the INSERT to the log into a new column say error_summary and you could index on that and use it in the SELECT.

You'd parse the full error message and strip out timestamps, userid's and specifics such as server name and maybe stack traces. If there is no clear parsing, leave error_summary as null. You could then do a preliminary search on error_summary and fall back to a search on Message if that failed.

Upvotes: 0

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

Just create a composite index for (date, message) and filter on the internal cte, not outside.

WITH data_cte(errorhour, message) 
     AS (SELECT Datepart(hh, date) AS ErrorDay, 
                message 
         FROM   cloud.errorlog 
         WHERE  date BETWEEN '2016-06-01' AND '2016-06-02'
           AND  message = 'error connecting to the server'
         )

Upvotes: 2

Related Questions