Simba
Simba

Reputation: 61

Get the n most frequent values

I'm using MS SQL and I want to get the n most frequent values of a given field. The current query is :

SELECT DISTINCT e.message
FROM monitor_main m inner join monitor_event e on (e.mainId = m.id)
WHERE 
e.statement = @statement
and m.dateStart >= @dateStart
and m.dateEnd <= @dateEnd

When I execute that query, I get a set of messages that have a date between @dateStart and @dateEnd. From this set I want to get the n most frequent e.message, I don't want to use PHP as I assume that it's not efficient. Note that I have a lot of rows in these tables (millions rows). I tried (n=10) :

SELECT DISTINCT e.message, COUNT(e.message)
FROM monitor_main m inner join monitor_event e on (e.mainId = m.id)
WHERE 
e.statement = @statement
and m.dateStart >= @dateStart
and m.dateEnd <= @dateEnd
GROUP BY e.message ORDER BY COUNT(e.message)
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

But I'm not sure that it solves my problem, if it does, is it efficient ? Can someone help me or suggest me another query ?

N.B. I cannot test it as I've a lot of rows and I have no rights to create a TEST table for now.

Thanks

Upvotes: 1

Views: 50

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271151

You don't need the SELECT DISTINCT and you need DESC for the ORDER BY:

SELECT e.message, COUNT(e.message)
FROM monitor_main m INNER JOIN
     monitor_event e 
     ON e.mainId = m.id
WHERE e.statement = @statement AND
      m.dateStart >= @dateStart AND
      m.dateEnd <= @dateEnd
GROUP BY e.message
ORDER BY COUNT(e.message) DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;

For performance, you want probably want indexes on monitor_event(statement, mainId, message), monitor_main(id, dateStart, dateEnd).

Upvotes: 2

Related Questions