Reputation: 328
In my project I have a pretty simple table which looks like this:
create table entity
(
id integer default 1,
session_id varchar not null,
type integer not null,
category integer not null,
created timestamp default now() not null
)
with (autovacuum_enabled=false);
create index created_index
on entity (created);
I also have a view that selects a grouped result of the entries of the last 30s which looks like this:
create view list(type, category, counter) as
SELECT
type,
category,
count(entity.id) AS counter
FROM entity
WHERE entity.created >= (now() - '00:00:30'::interval)
GROUP BY entity.type, entity.category;
Because there are no updates or deletes happening to the table I already set it to unlogged
and disabled auto_vaccuum.
The table has now roughly 20mio entries and the average select time for SELECT type, category, counter FROM list
is around 2 seconds.
Is there anything I can optimize in order to speed up the selects or is the current speed already the maximum that one can expect from such a large table?
EDIT:
This is the output of EXPLAIN
:
Subquery Scan on list (cost=9.37..9.73 rows=18 width=16) (actual time=425.268..425.278 rows=24 loops=1)
" Output: list.type, list.category, list.counter “
Buffers: shared hit=169485
-> HashAggregate (cost=9.37..9.55 rows=18 width=16) (actual time=425.267..425.272 rows=24 loops=1)
" Output: entity.type, entity.category, count(entity.id)
" Group Key: entity.type, entity.category
Buffers: shared hit=169485
" -> Index Scan using created_index on entity (cost=0.57..9.13 rows=32 width=12) (actual time=0.050..228.416 rows=165470 loops=1)"
" Output: entity.id, entity.session_id, entity.type, entity.category, entity.created"
Index Cond: (entity.created >= (now() - '00:00:30'::interval))
Buffers: shared hit=169485
Planning Time: 0.204 ms
Execution Time: 425.327 ms
The execution time looks pretty good, but this was executed while the system was at rest. Usually there are about 1000 inserts per second into the table.
About the auto-vaccuum, this was a desperate try to see if it improves anything. Should I enable it again?
Upvotes: 2
Views: 112
Reputation: 108641
This is a job for a covering index. If you create a compound index that can satisfy the whole query, you'll have a chance at doing that costly HashAggregate a different way.
Covering indexes typically work best for a limited set of queries. For yours it's this.
CREATE INDEX entity_cr_ty_ca_id ON entity(created, type, category) INCLUDE (id);
This works well because the query can ....
Random-access the index to the first eligible created
value.
Scan the index sequentially. It's a B-TREE index, so the values of type
and category
are in a useful order.
Pull the id
value from the index to check whether it's null before doing the COUNT(*).
If you know the id
values are never null you can simplify this. Use COUNT(*)
in place of COUNT(entity.id)
. And leave id
out of the index, creating it like this instead.
CREATE INDEX entity_cr_ty_ca ON entity(created, type, category);
And, it has to be said: even if you get your dbms to generate a large result set quickly, it still has to be transmitted to and interpreted by the program that requests it. There's no index sorcery that makes that faster.
Upvotes: 3