Andrew
Andrew

Reputation: 2154

Translating subquery to left join in sqlite

I have a query that is running against a SQLite database that uses a couple of subqueries. In order to accommodate some new requirements, I need to translate it to use joins instead. Below is the structure version of the original query:

SELECT c.id AS category_id, b.budget_year,
(
    SELECT sum(actual)
    FROM lines l1
    WHERE status = 'complete'
    AND category_id = c.id
    AND billing_year = b.budget_year
) AS actual
(
    SELECT sum(planned)
    FROM lines l2
    WHERE status IN ('forecasted', 'in-progress')
    AND category_id = c.id
    AND billing_year = b.budget_year
) AS rough_proposed
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
GROUP BY c.id, b.budget_year;

The next query is my first attempt to convert it to use LEFT OUTER JOINs:

SELECT c.id AS category_id, b.budget_year, sum(l1.actual) AS actual, sum(l2.planned) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l1 ON (l1.category_id = c.id
    AND l1.billing_year = b.budget_year
    AND l1.status = 'complete')
LEFT OUTER JOIN lines AS l2 ON (l2.category_id = c.id
    AND l2.billing_year = b.budget_year
    AND l2.status IN ('forecasted', 'in-progress'))
GROUP BY c.id, b.budget_year;

However, the actual and rough_proposed columns are much larger than expected. I am no SQL expert, and I am having a hard time understanding what is going on here. Is there a straightforward way to convert the subqueries to joins?

Upvotes: 1

Views: 74

Answers (1)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

There is a problem with both your queries. However, the first query hides the problem, while the second query makes it visible.

Here is what's going on: you join lines twice - once as l1 and once more as l2. The query before grouping would have the same line multiple times when there are both actual lines and forecast-ed / in-progress lines. When this happens, each line would be counted multiple times, resulting in inflated values.

The first query hides this, because it does not apply aggregation to actual and rough_proposed columns. SQLite picks the first entry for each group, which has the correct value.

You can fix your query by joining to lines only once, and counting the amounts conditionally, like this:

SELECT
    c.id AS category_id
,   b.budget_year
,   SUM(CASE WHEN l.status = 'complete' THEN l.actual END) AS actual
,   SUM(CASE WHEN l.status IN ('forecasted', 'in-progress') THEN l.planned END) AS planned
FROM categories AS c
LEFT OUTER JOIN budgets AS b ON (c.id = b.category_id)
LEFT OUTER JOIN lines AS l ON (l.category_id = c.id AND l1.billing_year = b.budget_year)
GROUP BY c.id, b.budget_year

In this new query each row from lines is brought in only once; the decision to count it in one of the actual/planned columns is made inside the conditional expression embedded in the SUM aggregating function.

Upvotes: 1

Related Questions