Makaroni
Makaroni

Reputation: 912

How to speed up count(distinct) with Between clause in MySQL

I have a MySQL table of 10 million rows and 3 columns, in following format:

id                                     time                               num

ca65e871-d758-437e-b76f-175234760e7b  2020-11-14 23:08:05.553770          11112222222
...

For running the first query below, I indexed the table on (num, time) and it works very fast (<5 milliseconds on 10 million rows table):

SELECT COUNT(*) 
FROM TABLE_NAME 
WHERE time >= '2020-11-14 23:08:05.553752' AND num = 11112222222

However I also need to execute count(distinct) on the same table with between clause, something like this:

SELECT COUNT(DISTINCT num) 
FROM TABLE_NAME 
WHERE time >= '2020-11-14 23:08:05.553752'
  AND num BETWEEN (11112222222 - 30)
              AND (11112222222 + 30)

This turns out to be significantly slower, around 200 milliseconds. Is there a way to speed the execution time of the second query on the same table?

Upvotes: 0

Views: 721

Answers (2)

Rick James
Rick James

Reputation: 142433

This is a 2D problem. And your WHERE clause is like a "bounding box".

Add INDEX(time, num) to give the Optimizer another choice.

If you expect to have more complex queries based on 2 ranges, see http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

Upvotes: 1

Akina
Akina

Reputation: 42728

If your MySQl is 8+ then try:

WITH RECURSIVE
cte AS ( SELECT 11112222222 - 30 num
         UNION ALL
         SELECT num + 1 FROM cte WHERE num < 11112222222 + 30 )
SELECT COUNT(*)
FROM cte
WHERE EXISTS ( SELECT NULL
               FROM TABLE_NAME 
               WHERE TABLE_NAME.num = cte.num
                 AND time >= '2020-11-14 23:08:05.553752' )

If you'll often execute such query then I'd suggest to create service table with the numbers from -30 to 30 and use it instead of recursive CTE.

Upvotes: 1

Related Questions