Reputation: 1025
Given I have 3 tables in DB, which contain different slices of data from same origin. All tables have pretty similar structure:
id | parent_id | timestamp | contents
Each table has parent_id (one parent to many records relationship) and timestamp index.
I need to access this data sorted on time basis. Currently I use next query:
prepare query3(bigint) as
select id, timestamp, contents, filter from
(select t1.id, t1.timestamp, t1.contents, 'filter1' as filter from table1 t1
where t1.parent_id = $1
union select t2.id, t2.timestamp, t2.contents, 'filter2' as filter from table2 t2
where t2.parent_id = $1
union select t3.id, t3.timestamp, t3.contents, 'filter3' as filter from table3 t3
where t3.parent_id = $1
) table_alias order by timestamp;
Since there are pretty much data in each table, it takes from 2 to 3 minutes each time I execute this query. According to explain: 650000 rows and Sort Method: external merge Disk: 186592kB
.
Is there any way to optimize a retrieval execution time without changing the schema, but building more effective query or creating a specific indexes?
Update added full explain analyze result here. In this case there are 4 tables in query, but I believe there is not much difference between 3 and 4 in this case.
"Sort (cost=83569.28..83959.92 rows=156258 width=80) (actual time=2288.871..2442.318 rows=639225 loops=1)"
" Sort Key: t1.timestamp"
" Sort Method: external merge Disk: 186592kB"
" -> Unique (cost=52685.43..54638.65 rows=156258 width=154) (actual time=1572.274..1885.966 rows=639225 loops=1)"
" -> Sort (cost=52685.43..53076.07 rows=156258 width=154) (actual time=1572.273..1737.041 rows=639225 loops=1)"
" Sort Key: t1.id, t1.timestamp, t1.contents, ('table1'::text)"
" Sort Method: external merge Disk: 186624kB"
" -> Append (cost=0.00..14635.39 rows=156258 width=154) (actual time=0.070..447.375 rows=639225 loops=1)"
" -> Index Scan using table1_parent_id on table1 t1 (cost=0.00..285.08 rows=5668 width=109) (actual time=0.068..5.993 rows=9385 loops=1)"
" Index Cond: (parent_id = $1)"
" -> Index Scan using table2_parent_id on table2 t2 (cost=0.00..11249.13 rows=132927 width=168) (actual time=0.063..306.567 rows=589056 loops=1)"
" Index Cond: (parent_id = $1)"
" -> Index Scan using table3_parent_id on table3 t3 (cost=0.00..957.18 rows=4693 width=40) (actual time=25.234..82.381 rows=20176 loops=1)"
" Index Cond: (parent_id = $1)"
" -> Index Scan using table4_parent_id_idx on table4 t4 (cost=0.00..581.42 rows=12970 width=76) (actual time=0.029..5.894 rows=20608 loops=1)"
" Index Cond: (parent_id = $1)"
"Total runtime: 2489.569 ms"
Upvotes: 1
Views: 135
Reputation: 1270553
A big chunk of your time is caused by eliminating duplicates for the union. Use union all instead:
select id, timestamp, contents, filter
from ((select t1.id, t1.timestamp, t1.contents, 'filter1' as filter
from table1 t1
where t1.parent_id = $1
)
union all
(select t2.id, t2.timestamp, t2.contents, 'filter2' as filter
from table2 t2
where t2.parent_id = $1
)
union all
(select t3.id, t3.timestamp, t3.contents, 'filter3' as filter
from table3 t3
where t3.parent_id = $1
)
) table_alias
order by timestamp;
To make this more effective, you should have an index on parent_id on each of the three tables. With those changes, it should run quite faste.
Upvotes: 1