Reputation: 330
I have a SQL query as:
SELECT
title,
(COUNT(DISTINCT A.id)) AS "count_title"
FROM
B
INNER JOIN D ON B.app = D.app
INNER JOIN A ON D.number = A.number
INNER JOIN C ON A.id = C.id
GROUP BY C.title
ORDER BY count_title DESC
LIMIT 10
;
Table D contains 50M records, A contains 30M records, B & C contains 30k records each. Indexes are defined on all columns used in joins, group by, order by.
The query works fine without the order by statement and returns results in around 2-3 sec.
But, with the sorting operation(order by) the query time increases to 10-12 seconds.
I understand the reason behind this, that executor has to traverse all the records for sorting operation and index will hardly help here.
Are there some other ways to speed up this query?
Here is the explain analyze of this query:
"QUERY PLAN"
"Limit (cost=974652.20..974652.22 rows=10 width=54) (actual time=2817.579..2825.071 rows=10 loops=1)"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974652.20..974666.79 rows=5839 width=54) (actual time=2817.578..2817.578 rows=10 loops=1)"
" Sort Key: (count(DISTINCT A.id)) DESC"
" Sort Method: top-N heapsort Memory: 26kB"
" Buffers: shared hit=120299 read=573195"
" -> GroupAggregate (cost=974325.65..974526.02 rows=5839 width=54) (actual time=2792.465..2817.097 rows=3618 loops=1)"
" Group Key: C.title"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974325.65..974372.97 rows=18931 width=32) (actual time=2792.451..2795.161 rows=45175 loops=1)"
" Sort Key: C.title"
" Sort Method: quicksort Memory: 5055kB"
" Buffers: shared hit=120299 read=573195"
" -> Gather (cost=968845.30..972980.74 rows=18931 width=32) (actual time=2753.402..2778.648 rows=45175 loops=1)"
" Workers Planned: 1"
" Workers Launched: 1"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Hash Join (cost=967845.30..970087.64 rows=11136 width=32) (actual time=2751.725..2764.832 rows=22588 loops=2)"
" Hash Cond: ((C.id)::text = (A.id)::text)"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Seq Scan on C (cost=0.00..1945.87 rows=66687 width=32) (actual time=0.017..4.316 rows=56684 loops=2)"
" Buffers: shared read=1279"
" -> Parallel Hash (cost=966604.55..966604.55 rows=99260 width=9) (actual time=2750.987..2750.987 rows=20950 loops=2)"
" Buckets: 262144 Batches: 1 Memory Usage: 4032kB"
" Buffers: shared hit=120266 read=571904"
" -> Nested Loop (cost=219572.23..966604.55 rows=99260 width=9) (actual time=665.832..2744.270 rows=20950 loops=2)"
" Buffers: shared hit=120266 read=571904"
" -> Parallel Hash Join (cost=219571.79..917516.91 rows=99260 width=4) (actual time=665.804..2583.675 rows=20950 loops=2)"
" Hash Cond: ((D.app)::text = (B.app)::text)"
" Buffers: shared hit=8 read=524214"
" -> Parallel Bitmap Heap Scan on D (cost=217542.51..895848.77 rows=5126741 width=13) (actual time=661.254..1861.862 rows=6160441 loops=2)"
" Recheck Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Heap Blocks: exact=242152"
" Buffers: shared hit=3 read=523925"
" -> Bitmap Index Scan on D_index_action_type (cost=0.00..214466.46 rows=12304178 width=0) (actual time=546.470..546.471 rows=12320882 loops=1)"
" Index Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Buffers: shared hit=3 read=33669"
" -> Parallel Hash (cost=1859.36..1859.36 rows=13594 width=12) (actual time=4.337..4.337 rows=16313 loops=2)"
" Buckets: 32768 Batches: 1 Memory Usage: 1152kB"
" Buffers: shared hit=5 read=289"
" -> Parallel Index Only Scan using B_index_app on B (cost=0.29..1859.36 rows=13594 width=12) (actual time=0.015..2.218 rows=16313 loops=2)"
" Heap Fetches: 0"
" Buffers: shared hit=5 read=289"
" -> Index Scan using A_index_number on A (cost=0.43..0.48 rows=1 width=24) (actual time=0.007..0.007 rows=1 loops=41900)"
" Index Cond: ((number)::text = (D.number)::text)"
" Buffers: shared hit=120258 read=47690"
"Planning Time: 0.747 ms"
"Execution Time: 2825.118 ms"
Upvotes: 1
Views: 531
Reputation: 246383
You could try to aim for a nested loop join between b
and d
because b
is so much smaller:
CREATE INDEX ON d (app);
If d
is vacuumed frequently enough, you could see if an index only scan is even faster. For that, include number
in the index (in v11, use the INCLUDE
clause for that!). The EXPLAIN
output suggests that you have an extra condition on action_type
; you'd have to include that column too for an index only scan.
Upvotes: 1