Reputation: 31221
A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option.
Update Revised
I have isolated the part of the query that fixes the problem (d.month_ref_id = 1
):
select
d.*
from
daily d
join month_ref m on m.id = d.month_ref_id
join year_ref y on y.id = m.year_ref_id
where
m.category_id = '001' and
d.month_ref_id = 1
However, I can't hard-code a month reference to 1
. The query that produces a full table scan is:
select
d.*
from
daily d
join month_ref m on m.id = d.month_ref_id
join year_ref y on y.id = m.year_ref_id
where
m.category_id = '001'
The index on daily.month_ref_id
is:
CREATE INDEX daily_month_ref_idx
ON climate.daily
USING btree
(month_ref_id);
Why is the query performing a full table scan and what can be done to avoid it?
Thank you!
Upvotes: 1
Views: 4309
Reputation: 537
I think the FTS is occurring because of the way you have parameterized your query / joins. By this, I mean that you have two parameters, one is being compared against a column in the 'daily' table, and another is being compared against a column in the 'month-ref' table. However, both of these values can be used to filter rows in a single table 'month-ref'. Make that table the primary table in your query and re-write your query as follows:
select
d.*
from month_ref m
join daily d on d.month_ref_id = m.id
join year_ref y on y.id = m.year_ref_id
where
m.category_id = '001' and
m.id = 1
This way, the database can easily locate all the necessary rows in the month-ref table based solely on the input parameter values, and rows in the daily table can be easily located over the specified join using the index you described. Depending on the number of rows might be found in the month-ref table, and whether either of the columns referenced above contain distinct values, you may need to create an index on the month-ref table.
Upvotes: 0
Reputation: 64674
Select avg(d.amount) AS amount, y.year
From (station s
Left Join city c -- You want to cross join on city? Why not use an Inner join?
On c.id = 10663
And 6371.009
* SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2)
+ (
COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2)
* POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)
)
) <= 50)
Join station_district sd
On sd.Id = s.station_district_id
Join year_ref y
On y.station_district_id = sd.id
Join month_ref m
On m.year_ref_id = y.id
Join daily d
On d.month_ref_id = m.id
Where s.elevation Between 0 And 2000
And y.year Between 1980 And 2000
And m.month = 12
And m.category_id = '001'
And d.daily_flag_id <> 'M'
Group By y.year
Since you are not using the station, station_district nor city table in the results, you might be able to move those to an exists statement:
Select avg(d.amount) AS amount, y.year
From year_ref y
Join month_ref m
On m.year_ref_id = y.id
Join daily d
On d.month_ref_id = m.id
Where y.year Between 1980 And 2000
And m.month = 12
And m.category_id = '001'
And d.daily_flag_id <> 'M'
And Exist (
Select 1
From station s1
Join city c1
On c1.id = 10663
Where 6371.009
* SQRT(
POW(RADIANS(c1.latitude_decimal - s1.latitude_decimal), 2)
+ (
COS(RADIANS(c1.latitude_decimal + s1.latitude_decimal) / 2)
* POW(RADIANS(c1.longitude_decimal - s1.longitude_decimal), 2)
)
) <= 50
And S1.station_district_id = y.station_district_id
)
Group By y.year
Upvotes: 3
Reputation: 57787
I don't know what other variations of the query you've tried, but the JOIN on City seems a little strange - have you tried replacing that with a WHERE clause? Also, the relationships between the various tables are currently in the WHERE clause - these are probably best implemented as an INNER JOIN.
Disclaimer: I don't know PostreSQL specifically.
EDIT: Here's a link that describes changing WHERE clauses to JOINs to influence join order, and discusses the join_collapse_limit to force the optimizer to use your specified join order. http://www.postgresql.org/docs/8.2/static/explicit-joins.html
EDIT2: Another alternative is to nest SELECT statements, which may also force the optimizer to construct the query in the (reverse) nesting order you specify.
Upvotes: 1