Reputation: 876
I have three large-ish tables, A
(10 million rows), A_C
(60 million rows) and C
(6 rows). They are related via bigint
fields and with primary foreign key relations. The tables have a n:m relation, this is why I added the A_C
table. The schema is as follow:
A.id = A_C.a_id
and
C.id = A_C.c_id
A
is the primary table here. I want to display the first 100 entries from A
and all corresponding rows from C
via A_C
. Currently the query takes very long. My first approach were two simple joins. Running EXPLAIN ANALYZE
I found out, that SQL does a sequential scan on A
. I think this is because it applies the limit at last and simply disregards it. This performs very poorly. Is there any way to work around this LIMIT
or another solution I am missing here? When I perform a LIMIT
on a simple select query, it performs just fine in ~2ms.
Edit: A.id
is primary key and has a btree index. A_C
has a unique constraint on the entire row and also an index on A_C.a_id
and A_C.c_id
. I have not yet joined C
as it is already really slow.
Fast queries:
SELECT A.id FROM A
SELECT A.id FROM A LIMIT 10
Slow query:
The following is fast in query planning but slow in execution since the data amount is very large:
SELECT A.id FROM A JOIN A_C ON A.id = A_C.a_id
I tried to avoid that by setting a limit on how many rows from A
are retrieved:
SELECT a.id FROM (SELECT A.id AS id FROM metrics.metrics LIMIT 10) a JOIN A_C ON (a.id = A_C.a_id)
The query plan via EXPLAIN ANALYZE
looks as follows:
"Hash Join (cost=5.75..1305897.53 rows=44790854 width=8) (actual time=251969.648..251969.648 rows=0 loops=1)"
" Hash Cond: (a_c.a_id = a.id)"
" -> Seq Scan on a_c (cost=0.00..690017.54 rows=44790854 width=8) (actual time=0.472..126203.253 rows=44790000 loops=1)"
" -> Hash (cost=4.50..4.50 rows=100 width=8) (actual time=1.889..1.889 rows=100 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 12kB"
" -> Limit (cost=0.00..3.50 rows=100 width=8) (actual time=0.527..1.550 rows=100 loops=1)"
" -> Seq Scan on a (cost=0.00..261275.88 rows=7465088 width=8) (actual time=0.520..0.912 rows=100 loops=1)"
"Planning time: 16.350 ms"
"Execution time: 251969.693 ms"
Upvotes: 1
Views: 1907
Reputation: 1269773
I would suggest a subquery for the rows from a
:
select . . .
from (select a.*
from a
order by a.id
fetch first 100 rows only -- or limit 100
) a join
a_c ac
on ac.a_id = a.id join
c
on ac.c_id = c.id;
Even though c
is really small, I would encourage a primary key on c(id)
if you don't already have one.
This assumes that "the first hundred rows" means by the ordering of the primary key.
Upvotes: 2