Reputation: 181
I am evaluating postgres to migrate away from Oracle. The following query runs too slow, also please find the explain plan:
explain analyze select DISTINCT EVENT.ID,
ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
from event.event
left join event.origin on event.id = origin.eventid
left join event.magnitude on origin.id = event.magnitude.origin_id
WHERE EXISTS(
select origin_id from event.magnitude
where magnitude.magnitude>=7.2
and origin.id=origin_id)
order by ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
"Unique (cost=740549.86..741151.42 rows=15039 width=80) (actual time=17791.557..17799.092 rows=5517 loops=1)"
" -> Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.556..17792.220 rows=5517 loops=1)"
" Sort Key: origin."time", event.magnitude.magnitude, event.id, event.preferred_origin_id, origin.id, event.contributor, origin.latitude, origin.longitude, origin.depth, origin.evtype, origin.catalog, origin.author, origin.contributor, event.magnitude.id, event.magnitude.type"
" Sort Method: quicksort Memory: 968kB"
" -> Nested Loop Left Join (cost=34642.50..739506.42 rows=15039 width=80) (actual time=6.927..17769.788 rows=5517 loops=1)"
" -> Hash Semi Join (cost=34642.50..723750.23 rows=14382 width=62) (actual time=6.912..17744.858 rows=2246 loops=1)"
" Hash Cond: (origin.id = event.magnitude.origin_id)"
" -> Merge Left Join (cost=0.00..641544.72 rows=6133105 width=62) (actual time=0.036..16221.008 rows=6133105 loops=1)"
" Merge Cond: (event.id = origin.eventid)"
" -> Index Scan using event_key_index on event (cost=0.00..163046.53 rows=3272228 width=12) (actual time=0.017..1243.616 rows=3276192 loops=1)"
" -> Index Scan using origin_fk_index on origin (cost=0.00..393653.81 rows=6133105 width=54) (actual time=0.013..3033.657 rows=6133105 loops=1)"
" -> Hash (cost=34462.73..34462.73 rows=14382 width=4) (actual time=6.668..6.668 rows=3198 loops=1)"
" Buckets: 2048 Batches: 1 Memory Usage: 113kB"
" -> Bitmap Heap Scan on magnitude (cost=324.65..34462.73 rows=14382 width=4) (actual time=1.682..5.414 rows=3198 loops=1)"
" Recheck Cond: (magnitude >= 7.2)"
" -> Bitmap Index Scan on mag_index (cost=0.00..321.05 rows=14382 width=0) (actual time=1.331..1.331 rows=3198 loops=1)"
" Index Cond: (magnitude >= 7.2)"
" -> Index Scan using mag_fkey_index on magnitude (cost=0.00..1.06 rows=3 width=22) (actual time=0.007..0.009 rows=2 loops=2246)"
" Index Cond: (origin.id = event.magnitude.origin_id)"
"Total runtime: 17799.669 ms"
This query runs in Oracle in 1 second while takes 16 seconds in postgres, The difference tells me that I am doing something wrong somewhere. This is a new installation on a local Mac machine with 12G of RAM.
I have:
effective_cache_size=4096MB
shared_buffer=2048MB
work_mem=100MB
Upvotes: 2
Views: 1963
Reputation: 51
According to the plan:
Sort (cost=740549.86..740587.45 rows=15039 width=80) (actual time=17791.55 ...
The sort operator was the main bottleneck. Try removing the order by
statement?
Upvotes: 0
Reputation: 425753
Remove DISTINCT
from your SELECT
clause. It's redundant since you are selecting PRIMARY KEY
columns anyway.
Change the LEFT JOIN
into the INNER JOIN
. As others mentioned, your EXISTS
clause eliminates all possible NULL
outputs generated by the LEFT JOIN
.
Create two indexes:
magnitude (magnitude, origin_id)
magnitude (origin_id, magnitude)
(each of them may be more efficient depending of how selective is your magnitude >= 7.2
conditions)
SELECT EVENT.ID,
ORIGIN.ID AS RIGINID,EVENT.PREFERRED_ORIGIN_ID AS PREFERRED_ORIGIN,
EVENT.CONTRIBUTOR, ORIGIN.TIME, ORIGIN.LATITUDE, ORIGIN.LONGITUDE,
ORIGIN.DEPTH, ORIGIN.EVTYPE, ORIGIN.CATALOG, ORIGIN.AUTHOR OAUTHOR,
ORIGIN.CONTRIBUTOR OCONTRIBUTOR,
MAGNITUDE.ID AS MAGID, MAGNITUDE.MAGNITUDE, MAGNITUDE.TYPE AS MAGTYPE
FROM event.event
JOIN event.origin
ON origin.eventid = event.id
JOIN event.magnitude
ON magnitude.origin_id = origin.id
WHERE origin.id IN
(
SELECT origin_id
FROM magnitude
WHERE magnitude >= 7.2
)
ORDER BY
ORIGIN.TIME desc, MAGNITUDE.MAGNITUDE desc,
EVENT.ID,EVENT.PREFERRED_ORIGIN_ID,ORIGIN.ID
A sample query to check:
WITH event (id) AS
(
VALUES
(1)
),
origin (id, eventid) AS
(
VALUES
(1, 1),
(2, 1)
),
magnitude (id, origin_id, magnitude) AS
(
VALUES
(1, 1, 4),
(2, 1, 8),
(3, 3, 6)
)
SELECT *
FROM event
LEFT JOIN
origin
ON origin.eventid = event.id
LEFT JOIN
magnitude
ON magnitude.origin_id = origin.id
WHERE EXISTS
(
SELECT origin_id
FROM magnitude
WHERE magnitude.magnitude >= 7.2
AND origin.id = origin_id
)
Both magnitudes, 4
and 8
, are returned for origin 1
.
Upvotes: 1
Reputation: 95761
There are useful links for learning how to interpret your own execution plan at the PostgreSQL wiki.
Upvotes: 0
Reputation: 25148
As has already been answered in at least one of the other forums you posted this question in, change the " left join event.origin on event.id = origin.eventid " part to be an inner join, since your EXISTS query further down restricts it to that anyway.
Upvotes: 0