How can I improve the select speed of my postgres query?

Question

In my project I have a pretty simple table which looks like this:

create table entity
(
    id integer default 1,
    session_id varchar not null,
    type integer not null,
    category integer not null,
    created timestamp default now() not null
)
with (autovacuum_enabled=false);


create index created_index
    on entity (created);

I also have a view that selects a grouped result of the entries of the last 30s which looks like this:

create view list(type, category, counter) as
    SELECT 
        type,
        category, 
        count(entity.id) AS counter
    FROM entity
    WHERE entity.created >= (now() - '00:00:30'::interval)
    GROUP BY entity.type, entity.category;

Because there are no updates or deletes happening to the table I already set it to unlogged and disabled auto_vaccuum.

The table has now roughly 20mio entries and the average select time for SELECT type, category, counter FROM list is around 2 seconds.

Is there anything I can optimize in order to speed up the selects or is the current speed already the maximum that one can expect from such a large table?

EDIT:

This is the output of EXPLAIN:

Subquery Scan on list  (cost=9.37..9.73 rows=18 width=16) (actual time=425.268..425.278 rows=24 loops=1)
"  Output: list.type, list.category, list.counter “
  Buffers: shared hit=169485
  ->  HashAggregate  (cost=9.37..9.55 rows=18 width=16) (actual time=425.267..425.272 rows=24 loops=1)
"        Output: entity.type, entity.category, count(entity.id)
"        Group Key: entity.type, entity.category
        Buffers: shared hit=169485
"        ->  Index Scan using created_index on entity  (cost=0.57..9.13 rows=32 width=12) (actual time=0.050..228.416 rows=165470 loops=1)"
"              Output: entity.id, entity.session_id, entity.type, entity.category, entity.created"
              Index Cond: (entity.created >= (now() - '00:00:30'::interval))
              Buffers: shared hit=169485
Planning Time: 0.204 ms
Execution Time: 425.327 ms

The execution time looks pretty good, but this was executed while the system was at rest. Usually there are about 1000 inserts per second into the table.

About the auto-vaccuum, this was a desperate try to see if it improves anything. Should I enable it again?

O. Jones · Accepted Answer

This is a job for a covering index. If you create a compound index that can satisfy the whole query, you'll have a chance at doing that costly HashAggregate a different way.

Covering indexes typically work best for a limited set of queries. For yours it's this.

CREATE INDEX entity_cr_ty_ca_id ON entity(created, type, category) INCLUDE (id);

This works well because the query can ....

Random-access the index to the first eligible created value.
Scan the index sequentially. It's a B-TREE index, so the values of type and category are in a useful order.
Pull the id value from the index to check whether it's null before doing the COUNT(*).

If you know the id values are never null you can simplify this. Use COUNT(*) in place of COUNT(entity.id). And leave id out of the index, creating it like this instead.

CREATE INDEX entity_cr_ty_ca ON entity(created, type, category);

And, it has to be said: even if you get your dbms to generate a large result set quickly, it still has to be transmitted to and interpreted by the program that requests it. There's no index sorcery that makes that faster.

How can I improve the select speed of my postgres query?

Answers (1)

Related Questions