Reputation: 1940
I have a table worker_activity_events
in PostgreSQL 11:
worker_id integer not null
created_at timestamp default now() not null
event_type text
Each record must have a worker_id
and created_at
. The query I would like to run often is
SELECT * FROM worker_activity_events
WHERE worker_id = $1
AND created_at BETWEEN $2 AND $3
To run the query fast, is it reasonable to add PRIMARY KEY(worker_id, created_at)
?
A concern might be: at a sample timestamp, 2 events of the same worker are generated, and the second one will be rejected because of the primary key (worker_id, created_at) is violated. Let's say on my app I can prevent this from happening.
Upvotes: 2
Views: 729
Reputation: 44423
What would the primary key be, if not for this consideration?
You can create a composite index on (worker_id, created_at)
. There is no reason to declare it to be a primary key just to get it as an index.
But you can also create an index, or maybe even primary key, on (worker_id, created_at, event_type)
. That index should be able to do everything the other one can, and more. Unless event_type is very wide, it shouldn't be much bigger. A downside is that if you update rows to change just event_type (which doesn't seem very likely, just based on the column names), this index would disable Heap-Only-Tuple optimization.
Upvotes: 0
Reputation: 248305
From the standpoint of database theory, I would say that you should define the primary key based on what really identifies a row uniquely, not based on performance considerations.
So if there is no natural primary key, define an artificial one, and use CREATE INDEX
to create the index you need for the query.
However, in real life you sometimes have to deviate from the theoretical ideal. If performance considerations dictate that you have as few indexes as possible, and you can live with the primary key you suggest, go for it. Otherwise stick with the theory - premature optimization is the root of all evil.
Upvotes: 1