Reputation: 45074
I have two queries that each, on their own, run pretty quickly (less than 2 seconds). However, when I try to join them as subqueries, it runs ridiculously slowly. The last time I ran it it took about 68 seconds. Here's the full query:
SELECT t.count,
t.total
FROM (SELECT t.account_number,
COUNT(t.id) count,
SUM(t.amount) total,
ib.id import_bundle_id
FROM import_bundle ib
JOIN generic_import gi ON gi.import_bundle_id = ib.id
JOIN transaction_import ti ON ti.generic_import_id = gi.id
JOIN account_transaction t ON t.transaction_import_id = ti.id
JOIN transaction_code tc ON t.transaction_code_id = tc.id
WHERE tc.code IN (0, 20, 40)
GROUP BY t.account_number) t
JOIN (SELECT a.account_number,
np.code
FROM import_bundle ib
JOIN generic_import gi ON gi.import_bundle_id = ib.id
JOIN account_import ai ON ai.generic_import_id = gi.id
JOIN account a ON a.account_import_id = ai.id
JOIN account_northway_product anp ON anp.account_id = a.id
JOIN northway_product np ON anp.northway_product_id = np.id
WHERE np.code != 'O1') a ON t.account_number = a.account_number
That this query should run slowly is not a total surprise. If these were two separate tables and not subqueries, I would put indexes on their account_number
columns. However, it's obviously not possible to put indexes on query results, so I can't do that. I suspect that's part of the problem.
Aside from that, I don't understand why the query is slow and I don't have any ideas on how to speed it up, other than adding two summary tables, which I don't want to do if I don't have to.
By the way, this query in English might be, "Get all the POS transactions (codes 0, 20 and 40) for accounts that aren't overdraft protection accounts (code O1)."
Upvotes: 8
Views: 3268
Reputation: 48129
Since most of your tables are already joined, why join again... Just tack-on the other tables where the IDs are already set. Additionally, since all are JOINs and none are LEFT joins, it is implied to only get records that are found across ALL joins (since your original query was ultimately joining the result sets too).
Additionally, by adding the AND np.code != '01', it will immediately exclude those entries thus only LEAVING those desired in your query. So this internal "PreQuery" (aliased PQ) does all the work for you. However, your group by does not include the import bundle ID and MAY through a false answer to you. Adjust as needed. Then the result of that is only pulling out two columns... the count and total which would return multiple rows, but no context to the account or totals, but you can adjust as you see fit.
SELECT PQ.count,
PQ.total
FROM (SELECT STRAIGHT_JOIN
t.account_number,
COUNT(t.id) count,
SUM(t.amount) total,
ib.id import_bundle_Id
FROM
transaction_code tc
join account_transaction t
on tc.id = t.transaction_import_id
join transaction_import ti
on t.transaction_import_id = ti.id
join generic_import gi
on ti.generic_import_id = gi.id
join import_bundle ib
on gi.import_bundle_id = ib.id
join account_import ai
on gi.id = ai.generic_import_id
join account a
on ai.id = a.account_import_id
join account_northway_product anp
on a.id = anp.account_id
join northway_product np
on anp.northway_product_id = np.id
AND np.code != '01'
where
tc.code in ( 0, 20, 40 )
group by
t.account_number ) PQ
Upvotes: 0
Reputation: 425251
Put the check into the main query:
SELECT t.account_number,
COUNT(t.id) count,
SUM(t.amount) total,
ib.id import_bundle_id
FROM transaction_code tc
JOIN account_transaction t
ON t.transaction_code_id = tc.id
JOIN transaction_import ti
ON ti.id = t.transaction_import_id
JOIN generic_import gi
ON gi.id = ti.generic_import_id
JOIN import_bundle ib
ON ib.id = gi.import_bundle_id
WHERE tc.code IN (0, 20, 40)
AND t.account_number NOT IN
(
SELECT anp.id
FROM account_northway_product anp
JOIN northway_product np
ON np.id = anp.northway_product_id
WHERE np.code = '01'
)
GROUP BY
t.account_number
Create the following indexes:
transaction_code (code)
account_transaction (transaction_code_id)
account_transaction (account_number)
Upvotes: 1
Reputation: 86706
It appears likely that MySQL is incorrectly optimising the query due to the overall complexity.
What you may find is the best option is to run it in stages.
SELECT * INTO #query1 FROM <query1>
SELECT * INTO #query2 FROM <query2>
SELECT * FROM #query1 INNER JOIN #query2 ON <predicate>
Another option may be to use HINTS, though I'm not familiar with them in MySQL. Essentially, find out the plans being generated for each subquery individually, then force the different joins to use NESTED LOOP JOIN, MERGE JOIN, etc. This restricts what the optimiser can do, and therefore get 'wrong', but also limits it's ability to respond as the data statistics change.
Upvotes: 0
Reputation: 1902
Based on what I see from you query;
1) Seems like your PK is account number on import_bundle table. You need clustered index on that column 2) Adding index on (code) northway_product table and (code) on transaction_code table would help as well.
Upvotes: 0