Reputation: 31
I have a table of error logs with around 300 million rows. There is an index on the Date column but I am trying to query by both date and error message. When I query by date it is fast but I need to query by message as well which slows it down.
My query is as follows
WITH data_cte(errorhour, message)
AS (SELECT Datepart(hh, date) AS ErrorDay,
message
FROM cloud.errorlog
WHERE date <= '2016-06-02'
AND date >= '2016-06-01')
SELECT errorhour,
Count(*) AS count,
message
FROM data_cte
WHERE message = 'error connecting to the server'
GROUP BY errorhour
ORDER BY errorhour
adding the where clause slows it down because Message is not indexed. How can I speed it up?
EDIT: I cannot index on Message
because it is defined as varchar(max)
.
Upvotes: 2
Views: 3433
Reputation: 6255
If you will ALWAYS be searching for the text 'error connecting to the server'
then you can use a filtered index:
CREATE INDEX ix_ectts ON ErrorLog (Date)
WHERE (Date between '2016-06-01' and '2016-06-02')
AND Message='error connecting to the server';
This index should be fairly small in bytes consumed, and quick to consult. It may be fairly slow to update however; consider creating it every time you need to run this query and dropping it afterward.
Another choice is to use a computed column on the first few hundred characters of Message
, and index on that:
ALTER TABLE ErrorLog
ADD Message_index AS (cast (Message as varchar(400)));
CREATE INDEX theIndex ON ErrorLog (Message_index, [date]);
EDIT: added missing parentheses after cast
Upvotes: 2
Reputation: 1269753
You can simplify the query to:
SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
count(*)
FROM cloud.errorlog
WHERE date <= '2016-06-02' AND date >= '2016-06-01') AND
message = 'error connecting to the server'
GROUP BY Datepart(day, date), datepart(hour, date);
Then for this query, you want an index on errorlog(message, date)
. It is important that the message
be first in the index, because of the equality comparison.
EDIT:
If the message is too long and you want queries like this, I would recommend adding a computed column and use that for the index and where
clause:
alter table errlog add message250 as (left(message, 250));
create index idx_errlog_message250_date on (message250, date);
And then write the query as:
SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
count(*)
FROM cloud.errorlog
WHERE date <= '2016-06-02' AND date >= '2016-06-01') AND
message250 = 'error connecting to the server'
GROUP BY Datepart(day, date), datepart(hour, date);
Upvotes: 0
Reputation: 1637
If it is possible to extract a short summary of the error message, you could then include that in the INSERT to the log into a new column say error_summary
and you could index on that and use it in the SELECT.
You'd parse the full error message and strip out timestamps, userid's and specifics such as server name and maybe stack traces. If there is no clear parsing, leave error_summary
as null
. You could then do a preliminary search on error_summary
and fall back to a search on Message
if that failed.
Upvotes: 0
Reputation: 48197
Just create a composite index for (date, message)
and filter on the internal cte, not outside.
WITH data_cte(errorhour, message)
AS (SELECT Datepart(hh, date) AS ErrorDay,
message
FROM cloud.errorlog
WHERE date BETWEEN '2016-06-01' AND '2016-06-02'
AND message = 'error connecting to the server'
)
Upvotes: 2