Reputation: 61
I'm using MS SQL and I want to get the n most frequent values of a given field. The current query is :
SELECT DISTINCT e.message
FROM monitor_main m inner join monitor_event e on (e.mainId = m.id)
WHERE
e.statement = @statement
and m.dateStart >= @dateStart
and m.dateEnd <= @dateEnd
When I execute that query, I get a set of messages that have a date between @dateStart and @dateEnd. From this set I want to get the n most frequent e.message, I don't want to use PHP as I assume that it's not efficient. Note that I have a lot of rows in these tables (millions rows). I tried (n=10) :
SELECT DISTINCT e.message, COUNT(e.message)
FROM monitor_main m inner join monitor_event e on (e.mainId = m.id)
WHERE
e.statement = @statement
and m.dateStart >= @dateStart
and m.dateEnd <= @dateEnd
GROUP BY e.message ORDER BY COUNT(e.message)
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
But I'm not sure that it solves my problem, if it does, is it efficient ? Can someone help me or suggest me another query ?
N.B. I cannot test it as I've a lot of rows and I have no rights to create a TEST table for now.
Thanks
Upvotes: 1
Views: 50
Reputation: 1271151
You don't need the SELECT DISTINCT
and you need DESC
for the ORDER BY
:
SELECT e.message, COUNT(e.message)
FROM monitor_main m INNER JOIN
monitor_event e
ON e.mainId = m.id
WHERE e.statement = @statement AND
m.dateStart >= @dateStart AND
m.dateEnd <= @dateEnd
GROUP BY e.message
ORDER BY COUNT(e.message) DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
For performance, you want probably want indexes on monitor_event(statement, mainId, message)
, monitor_main(id, dateStart, dateEnd)
.
Upvotes: 2