Reputation: 507
Running postgres 9.5
SELECT events.id, events.start_time, events.host_id, events.title from events
JOIN accountsevents ON accountsevents.events_id = events.id
WHERE accountsevents.accounts_id = %(account_id)s OR events.host_id = %(account_id)s
GROUP BY events.id
ORDER BY start_time DESC
I have this query and postgres says the cost is over 100k. Seems excessive. This is the only query I have that isn't utilizing the indexes i've created for each table.
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"events_host_id_fkey" FOREIGN KEY (host_id) REFERENCES accounts(id)
Referenced by:
TABLE "accountsevents" CONSTRAINT "accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
TABLE "eventsinterests" CONSTRAINT "eventsinterests_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
Indexes:
"accountsevents_pkey" PRIMARY KEY, btree (id, accounts_id, events_id)
Foreign-key constraints:
"accountsevents_accounts_id_fkey" FOREIGN KEY (accounts_id) REFERENCES accounts(id)
"accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
I feel like the index is setup wrong or I'm just missing something with the query. The initial sequence scans are killing it.
ADDED EXPLAIN VERBOSE
"Sort (cost=124388.27..124390.10 rows=732 width=40) (actual time=1533.902..1533.928 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Sort Key: events.start_time DESC"
" Sort Method: quicksort Memory: 66kB"
" -> HashAggregate (cost=124346.12..124353.44 rows=732 width=40) (actual time=1533.658..1533.759 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Group Key: events.id"
" -> Hash Join (cost=4912.30..124344.29 rows=732 width=40) (actual time=56.671..1532.831 rows=971 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Hash Cond: (accountsevents.events_id = events.id)"
" Join Filter: ((accountsevents.accounts_id = 1) OR (events.host_id = 1))"
" Rows Removed by Join Filter: 2761882"
" -> Seq Scan on public.accountsevents (cost=0.00..45228.52 rows=2762852 width=8) (actual time=0.005..466.094 rows=2762853 loops=1)"
" Output: accountsevents.events_id, accountsevents.accounts_id"
" -> Hash (cost=2795.91..2795.91 rows=104191 width=40) (actual time=53.579..53.579 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Buckets: 65536 Batches: 4 Memory Usage: 2462kB"
" -> Seq Scan on public.events (cost=0.00..2795.91 rows=104191 width=40) (actual time=0.004..26.171 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
"Planning time: 0.201 ms"
"Execution time: 1534.024 ms"
Upvotes: 0
Views: 52
Reputation: 246083
No index will help you with this query.
The problem is that you have an OR
in the WHERE
condition, so it is not possible to apply a filter before the tables are joined, which is where an index could help you. Try to replace the OR
with an AND
and see how PostgreSQL can do much better.
This way PostgreSQL has to compute the whole join and can only filter out rows afterwards – see the great number of Rows Removed by Join Filter.
The only thing an index could be used for would be a nested loop join, and that would be much more expensive. So I don't think that there is a better plan for this query.
You can see that PostgreSQL's estimates of the number of rows is pretty good, which is normally a sign that PostgreSQL indeed did the right thing.
Maybe you can do better with a query like
SELECT * FROM
(SELECT ... FROM events JOIN accountsevents ON ...
WHERE accountsevents.accounts_id = 1
UNION
SELECT ... FROM events JOIN accountsevents ON ...
WHERE events.host_id = 1) sub
GROUP BY ... ORDER BY ...
but I wouldn't bet on it.
(Note: this query is semantically slightly different, but maybe it doesn't matter in your case.)
Upvotes: 3