Reputation: 2154
I have a query that is running against a SQLite database that uses a couple of subqueries. In order to accommodate some new requirements, I need to translate it to use joins instead. Below is the structure version of the original query:
SELECT c.id AS category_id, b.budget_year,
(
SELECT sum(actual)
FROM lines l1
WHERE status = 'complete'
AND category_id = c.id
AND billing_year = b.budget_year
) AS actual
(
SELECT sum(planned)
FROM lines l2
WHERE status IN ('forecasted', 'in-progress')
AND category_id = c.id
AND billing_year = b.budget_year
) AS rough_proposed
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
GROUP BY c.id, b.budget_year;
The next query is my first attempt to convert it to use LEFT OUTER JOIN
s:
SELECT c.id AS category_id, b.budget_year, sum(l1.actual) AS actual, sum(l2.planned) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l1 ON (l1.category_id = c.id
AND l1.billing_year = b.budget_year
AND l1.status = 'complete')
LEFT OUTER JOIN lines AS l2 ON (l2.category_id = c.id
AND l2.billing_year = b.budget_year
AND l2.status IN ('forecasted', 'in-progress'))
GROUP BY c.id, b.budget_year;
However, the actual
and rough_proposed
columns are much larger than expected. I am no SQL expert, and I am having a hard time understanding what is going on here. Is there a straightforward way to convert the subqueries to joins?
Upvotes: 1
Views: 74
Reputation: 726579
There is a problem with both your queries. However, the first query hides the problem, while the second query makes it visible.
Here is what's going on: you join lines
twice - once as l1
and once more as l2
. The query before grouping would have the same line multiple times when there are both actual lines and forecast-ed / in-progress lines. When this happens, each line would be counted multiple times, resulting in inflated values.
The first query hides this, because it does not apply aggregation to actual
and rough_proposed
columns. SQLite picks the first entry for each group, which has the correct value.
You can fix your query by joining to lines only once, and counting the amounts conditionally, like this:
SELECT
c.id AS category_id
, b.budget_year
, SUM(CASE WHEN l.status = 'complete' THEN l.actual END) AS actual
, SUM(CASE WHEN l.status IN ('forecasted', 'in-progress') THEN l.planned END) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l ON (l.category_id = c.id AND l1.billing_year = b.budget_year)
GROUP BY c.id, b.budget_year
In this new query each row from lines
is brought in only once; the decision to count it in one of the actual
/planned
columns is made inside the conditional expression embedded in the SUM
aggregating function.
Upvotes: 1