Reputation: 2896
I have the following scenario. An area has multiple territories, a territory has multiple addresses and an address is visited multiple times in a month. Now I want to generate a monthly report about an area. (How many times an area has been visited).I have written the query but the result set is producing less areas because some addresses are not visited. I have the following structure
tables
areas: id|name
(180 rows) //name is unique
territories: id|name|area_id
(1k rows)
addresses: id|name|territory_id
(80k rows)
visiting_addresses: id|address_id|date|status
(1M+ rows) //status => 1 = visited, 2 = pending
My query is following.
select ar.id as area_id, ar.name as area,
sum(case when va.status = 1 then 1 else 0 end) as visited,
sum(case when va.status = 2 then 1 else 0 end) as pending,
count(va.id) as total
from areas ar
left join territories t on t.area_id=ar.id
left join addresses a on a.territory_id=t.id
left join visiting_addresses va on va.address_id=a.id
where month(va.date) = '01'
and year(va.date)='2020'
group by ar.id
the area table contains 180 areas but the result set shows only 144 areas. Where is my mistake and what is the explanation to this? those areas are missing because they have no visiting.
Upvotes: 0
Views: 435
Reputation: 31772
Your WHERE clause is converting the LEFT JOIN with visiting_addresses
to an INNER JOIN. And since it's the right most table in a LEFT-JOIN-chain, all joins will be converted to INNER JOINS. To prevent that, you should move the corresponding conditions from the WHERE clause to the ON clause:
select ar.id as area_id, ar.name as area,
sum(case when va.status = 1 then 1 else 0 end) as visited,
sum(case when va.status = 2 then 1 else 0 end) as pending,
count(va.id) as total
from areas ar
left join territories t on t.area_id=ar.id
left join addresses a on a.territory_id=t.id
left join visiting_addresses va
on va.address_id=a.id
and month(va.date) = '01'
and year(va.date)='2020'
group by ar.id
But since you have alot of rows, I would rather run two queries. First get only areas with adresses form the last month using inner joins. You should though change your conditions on va.date
to utilize an index:
select ar.id as area_id, ar.name as area,
sum(case when va.status = 1 then 1 else 0 end) as visited,
sum(case when va.status = 2 then 1 else 0 end) as pending,
count(va.id) as total
from areas ar
join territories t on t.area_id=ar.id
join addresses a on a.territory_id=t.id
join visiting_addresses va on va.address_id=a.id
where va.date >= '2020-01-01'
and va.date < '2020-02-01'
group by ar.id
Make sure you have an index on visiting_addresses(date)
or even better on visiting_addresses(date, address_id, status)
.
Then get all areas with a simple
select ar.id as area_id, ar.name as area
from areas ar
and add missing areas to the first result while setting visited
, pending
and total
to zero (in application code).
The INNER JOIN should be much faster, because now the engine can start reading only the necessary rows from the visiting_addresses
using an index for the WHERE conditions.
You could also use a more complex but single query. The Idea is to use a LEFT JOIN with a pre-aggregated subquery:
select ar.id as area_id, ar.name as area,
coalesce(visited, 0) as visited,
coalesce(pending, 0) as pending,
coalesce(total, 0) as total
from areas ar
left join (
select t.area_id
sum(case when va.status = 1 then 1 else 0 end) as visited,
sum(case when va.status = 2 then 1 else 0 end) as pending,
count(va.id) as total
from territories t
join addresses a on a.territory_id=t.id
join visiting_addresses va on va.address_id=a.id
where va.date >= '2020-01-01'
and va.date < '2020-02-01'
group by t.area_id
) x on x.area_id = ar.id
Upvotes: 1
Reputation: 521093
Try moving the logic in the WHERE
clause to the ON
clause of the appropriate join:
SELECT
ar.id AS area_id,
ar.name AS area,
COUNT(CASE WHEN va.status = 1 THEN 1 END) AS visited,
COUNT(CASE WHEN va.status = 2 THEN 1 END) AS pending,
COUNT(va.id) AS total
FROM areas ar
LEFT JOIN territories t ON t.area_id = ar.id
LEFT JOIN addresses a ON a.territory_id = t.id
LEFT JOIN visiting_addresses va ON va.address_id = a.id AND
va.date >= '2020-01-01' AND va.date < '2020-02-01'
GROUP BY
ar.id;
Note that selecting the name
field while only aggregating by id
is valid in MySQL, assuming that id
be a unique field in the areas
table.
You may also try adding the following index to the visiting_addresses
table:
CREATE INDEX date_idx ON visiting_addresses (address_id, date, status);
This might help speed up the join to this table.
Upvotes: 0