Reputation: 1
I count the number of users in this way, it runs for 5 seconds to produce results, I am looking for a better solution
SELECT COUNT(*)
FROM (SELECT user_id
FROM slot_result_primary
WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999
GROUP BY user_id) AS foo
Upvotes: 0
Views: 127
Reputation: 186
Counting is a very heavy operation in Postgres. It should be avoided if possible. It is very difficult to make it better so for each row Postgress needs to go the the disc. You can indeed create a better index to choose which rows to pick from the disc faster but even with this count time will always go up in time in a linear time compared to the size of the data.
Your index should be:
CREATE INDEX session_timestamp_user_id_index ON slot_result_primary (session_timestamp, user_id)
for best results.
Still an index will not solve your count problems fully. In a similar situation I faced two days ago (with a SELECT query running 3s and count running 1s) dedicated indexes allowed to push down the time of select to 0,3ms but best I could do with count was 700ms.
Here you can find a good article with a summary why count is difficult and different ways to make it better: https://www.citusdata.com/blog/2016/10/12/count-performance/
Upvotes: 0
Reputation: 6417
First of all you can simplify the query:
SELECT COUNT(DISTINCT user_id)
FROM slot_result_primary
WHERE session_timestamp BETWEEN 1590598800000 AND 1590685199999
Most importantly - make sure you have an index on sesion_timestamp
Upvotes: 2