Reputation: 1312
I have to query once per 10 minutes for the amount of users that have been active the last 1, 24, 724 and 3024 hours from a datapool where we store one line per user action.
When a user does something, we store the hashed userId, hashed action, timestamp and the group the user belongs to in a table. This table is used for a lot of statistical purposes (e.g. decide which features are used most, which features lead to user loss and so on.)
However the query that happens most often on this table is to get the amount of unique users in a given period of time.
SELECT
count(user) as "1m",
count(*) FILTER (WHERE "timestamp" >= (now() - interval '7 days')::timestamp) as "1w",
count(*) FILTER (WHERE "timestamp" >= (now() - interval '1 day')::timestamp) as "1d",
count(*) FILTER (WHERE "timestamp" >= (now() - interval '1 hour')::timestamp) as "1h"
FROM (
SELECT
"user" as "user",
(max(timestamp) + interval '1 hour')::timestamp as "timestamp"
FROM public.user_activity
WHERE
public.user_activity."timestamp" >= (now() - interval '1 month')::timestamp
AND "system" = 'enterprise'
GROUP BY "user"
) as a
so in the subquery
this subquery returns usually between 10k and 100k entries (but should work for more, too)
then we do another query on this subquery:
This query runs on a few million entries (growing rapidly).
How can I improve the query to run faster? What indexes would be beneficial? (Using AWS RDS hitting the IOPS limit of our 100GB SSD)
Upvotes: 0
Views: 88
Reputation: 1269873
I would recommend an index on user_activity(system, timestamp, user)
.
If all rows are visible in the index, then this covers the subquery.
Not much can be done for the outer query.
However, I wonder if phrasing the query like this:
SELECT . . .
FROM (SELECT DISTINCT ON (user) "user" as "user",
(timestamp + interval '1 hour')::timestamp as timestamp
FROM public.user_activity ua
WHERE "system" = 'enterprise'
ORDER BY user, timestamp DESC
) ua
WHERE ua.timestamp >= now() - interval '1 month';
(Note: The filtering may be off by an hour. It is a bit hard to follow your exact date filtering logic.)
And with an index on user_activity(system, user, timestamp desc)
would provide better performance.
Upvotes: 1