Should I use composite primary key to speed up timestamp-based select in PostgreSQL?

Question

I have a table worker_activity_events in PostgreSQL 11:

column1: worker_id integer not null
column2: created_at timestamp default now() not null
column3: event_type text

Each record must have a worker_id and created_at. The query I would like to run often is

SELECT * FROM worker_activity_events
WHERE worker_id = $1
  AND created_at BETWEEN $2 AND $3

To run the query fast, is it reasonable to add PRIMARY KEY(worker_id, created_at)?

A concern might be: at a sample timestamp, 2 events of the same worker are generated, and the second one will be rejected because of the primary key (worker_id, created_at) is violated. Let's say on my app I can prevent this from happening.

Laurenz Albe · Accepted Answer

From the standpoint of database theory, I would say that you should define the primary key based on what really identifies a row uniquely, not based on performance considerations.

So if there is no natural primary key, define an artificial one, and use CREATE INDEX to create the index you need for the query.

However, in real life you sometimes have to deviate from the theoretical ideal. If performance considerations dictate that you have as few indexes as possible, and you can live with the primary key you suggest, go for it. Otherwise stick with the theory - premature optimization is the root of all evil.

Should I use composite primary key to speed up timestamp-based select in PostgreSQL?

Answers (2)

Related Questions