Reputation: 375
I'm trying to optimize this query it returns multiple rows from the building_rent_prices and the building_weather and then groups them and calculates the average of their field. So far the tables are all under a million rows yet it takes several seconds, does anyone know how i could optimize this from composite indexes or rewriting the query? I'm assuming it should be able to be a 100ms or quicker query but so far it seems like it cant
SELECT b.*
, AVG(r.rent)
, AVG(w.high_temp)
FROM buildings b
LEFT
JOIN building_rent_prices r
ON r.building_id = b.building_id
LEFT
JOIN building_weather w
ON w.building_id = b.building_id
WHERE w.date BETWEEN CURDATE() AND CURDATE + INTERVAL 4 DAY
AND r.date BETWEEN CURDATE() AND CURDATE + INTERVAL 10 day
GROUP
BY b.building_id
ORDER
BY AVG(r.rent) / b.square_feet DESC
LIMIT 10;
Explain said the following:
1 SIMPLE building_rent_prices range
1 SIMPLE buildings eq_ref
1 SIMPLE building_weather ref
Using where; Using index; Using temporary; Using filesort
Using where
Using where; Using index
Im working on some test data heres the create table
CREATE TABLE building(
building_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255),
square_feet INT
);
CREATE TABLE building_weather(
building_weather_id INT PRIMARY KEY AUTO_INCREMENT,
building_id INT,
weather_date DATE,
high_temp INT
);
CREATE TABLE building_rates(
building_rate_id INT PRIMARY KEY AUTO_INCREMENT,
building_id INT,
weather_date DATE,
rate double
);
ALTER TABLE building_rates INDEX(building_id);
ALTER TABLE buildings INDEX(building_id);
ALTER TABLE building_weather INDEX(building_id);
This seems to working in under 1 second based on DRapp's answer without indexes(I still need to test that its valid)
select
B.*,
BRP.avgRent,
BW.avgTemp
from
( select building_id,
AVG( rent ) avgRent
from
building_rent_prices
where
date BETWEEN CURDATE() AND CURDATE() + 10
group by
building_id
order by
building_id ) BRP
JOIN buildings B
on BRP.building_id = B.building_id
left join ( select building_id,
AVG( hi_temp ) avgTemp
from building_weather
where date BETWEEN CURDATE() AND CURDATE() + 10
group by building_id) BW
on BRP.building_id = BW.building_id
GROUP BY BRP.building_id
ORDER BY BRP.avgRent / 1 DESC
LIMIT 10;
Upvotes: 0
Views: 53
Reputation: 375
For anyone who has issues similar to mine the solution is to GROUP each table you would like to join using building_id that way you are joining one to one with every average. Ollie Jones query with JOIN rather than LEFT JOIN is the closest answer if you do not want results that don't have data in all tables. Also The main issue I had was that I forgot to place an index on a avg(low_temp) column so the INDEXES. What I learned from this is that if you do an aggregated function in your select it belongs in your indexes. I added low_temp to it.
building_weather (date, building_id, hi_temp, low_temp) AS suggested by Ollie and DR APP
ALTER TABLE building_weather ADD index(date, building_id, hi_temp, low_temp);
SELECT buildings.*, a.rent, b.high_temp, b.low_temp
FROM buildings
JOIN (
SELECT building_id, AVG(rent) rent
FROM building_rent_prices
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY
GROUP BY building_id
) AS a ON buildings.building_id = a.building_id
JOIN (
SELECT building_id, AVG(high_temp) high_temp, AVG(low_temp) low_temp
FROM building_weather
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 4 DAY
GROUP BY building_id
) AS b ON buildings.building_id = b.building_id
ORDER BY a.rent / buildings.square_feet DESC
LIMIT 10
Upvotes: 0
Reputation: 48139
First, your query to the WEATHER based table is only for 4 days, the RENT PRICES table is for 10 days. Since you don't have any join correlation between the two, you will result in a Cartesian result of 40 records per one building ID. Was that intentional or just not identified as an oops...
Second, I would adjust the query as I have below, but also, I have adjusted BOTH WEATHER and RENT PRICES tables to reflect the same date range period. I start with an sub query of just the prices and group by building and date, then join to buildings, then another sub query to weather grouped by building and date. But here, I join from the rent prices sub query to the weather sub query on both building ID AND date so it will at most retain a 1:1 ratio. I don't know why weather is even a consideration spanning date ranges.
However to help with indexes, I would suggest the following
Table Index on
buildings (Building_ID) <-- probably already exists as a PK
building_rent_prices (date, building_id, rent)
building_weather (date, building_id, hi_temp)
The purpose of the index is to take advantage of the WHERE clause (date first), THEN the GROUP BY ( building ID), and is a COVERING INDEX (includes the rent). Similarly for the building weather table for same reasons.
select
B.*,
BRP.avgRent,
BW.avgTemp
from
( select building_id,
AVG( rent ) avgRent
from
building_rent_prices
where
date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY
group by
building_id
order by
building_id ) BRP
JOIN buildings B
on BRP.building_id = B.building_id
left join ( select building_id,
AVG( hi_temp ) avgTemp
from
building_weather
where
date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY
group by
building_id ) BW
on BRP.building_id = BW.building_id
GROUP BY
BRP.building_id
ORDER BY
BRP.avgRent / B.square_feet DESC
LIMIT 10;
CLARIFICATION...
I cant guarantee the execution order, but in essence, the two ( queries ) for BPR and BW aliases, they would be done and executed quickly before any join took place. If you wanted the average across the (in my example) 10 days vs a per-day join, then I have removed the "date" as a component of the group, so each will return respectively at most, 1 per building.
Now, joining to the building table on just the 1:1:1 ratio will limit the records in the final result set. This should take care of your concern of the average over those days in question.
Upvotes: 1
Reputation: 108651
Let's take a look at this query in detail. You want to report two different kinds of averages for each building. You need to compute those in separate subqueries. If you don't you'll get the cartesian combinatorial explosion.
One is an average of eleven days' worth of rent prices. You get that data with this subquery:
SELECT building_id, AVG(rent) rent
FROM building_rent_prices
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY
GROUP BY building_id
This subquery can be optimized by a compound covering index on building_rent_prices
, consisting of (date, building_id, rent)
.
The next is an average of five days' worth of temperature.
SELECT building_id, AVG(high_temp) high_temp
FROM building_weather
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 4 DAY
GROUP BY building_id
This can be optimized by a compound covering index on building_weather
, consisting of (date, building_id, high_temp)
.
Finally, you need to join these two subqueries to your buildings
table to generate the final result set.
SELECT buildings.*, a.rent, b.high_temp
FROM buildings
LEFT JOIN (
SELECT building_id, AVG(rent) rent
FROM building_rent_prices
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY
GROUP BY building_id
) AS a ON buildings.building_id = a.building_id
LEFT JOIN (
SELECT building_id, AVG(high_temp) high_temp
FROM building_weather
WHERE date BETWEEN CURDATE() AND CURDATE() + INTERVAL 4 DAY
GROUP BY building_id
) AS b ON buildings.building_id = b.building_id
ORDER BY a.rent / buildings.square_feet DESC
LIMIT 10
Once the two subqueries are optimized, this one doesn't need anything except the building_id
primary key.
In summary, to speed up this query, create the two compound indexes mentioned on the building_rent_prices
and building_weather
queries.
Upvotes: 1
Reputation: 142298
Don't use CURDATE + 4:
mysql> select CURDATE(), CURDATE() + 30, CURDATE() + INTERVAL 30 DAY;
+------------+----------------+-----------------------------+
| CURDATE() | CURDATE() + 30 | CURDATE() + INTERVAL 30 DAY |
+------------+----------------+-----------------------------+
| 2015-03-15 | 20150345 | 2015-04-14 |
+------------+----------------+-----------------------------+
Add INDEX(building_id)
to the second and third tables.
If those don't fix it; come back with a revised query and schema, and I will look deeper.
Upvotes: 1