Jason Swett
Jason Swett

Reputation: 45074

Two queries are fast separately, slow when joined as subqueries

I have two queries that each, on their own, run pretty quickly (less than 2 seconds). However, when I try to join them as subqueries, it runs ridiculously slowly. The last time I ran it it took about 68 seconds. Here's the full query:

SELECT t.count,
       t.total
  FROM (SELECT t.account_number,
               COUNT(t.id) count,
               SUM(t.amount) total,
               ib.id import_bundle_id
          FROM import_bundle ib
          JOIN generic_import gi ON gi.import_bundle_id = ib.id
          JOIN transaction_import ti ON ti.generic_import_id = gi.id
          JOIN account_transaction t ON t.transaction_import_id = ti.id
          JOIN transaction_code tc ON t.transaction_code_id = tc.id
         WHERE tc.code IN (0, 20, 40)
      GROUP BY t.account_number) t
  JOIN (SELECT a.account_number,
               np.code
          FROM import_bundle ib
          JOIN generic_import gi ON gi.import_bundle_id = ib.id
          JOIN account_import ai ON ai.generic_import_id = gi.id
          JOIN account a ON a.account_import_id = ai.id
          JOIN account_northway_product anp ON anp.account_id = a.id
          JOIN northway_product np ON anp.northway_product_id = np.id
         WHERE np.code != 'O1') a ON t.account_number = a.account_number

That this query should run slowly is not a total surprise. If these were two separate tables and not subqueries, I would put indexes on their account_number columns. However, it's obviously not possible to put indexes on query results, so I can't do that. I suspect that's part of the problem.

Aside from that, I don't understand why the query is slow and I don't have any ideas on how to speed it up, other than adding two summary tables, which I don't want to do if I don't have to.

By the way, this query in English might be, "Get all the POS transactions (codes 0, 20 and 40) for accounts that aren't overdraft protection accounts (code O1)."

Upvotes: 8

Views: 3268

Answers (4)

DRapp
DRapp

Reputation: 48129

Since most of your tables are already joined, why join again... Just tack-on the other tables where the IDs are already set. Additionally, since all are JOINs and none are LEFT joins, it is implied to only get records that are found across ALL joins (since your original query was ultimately joining the result sets too).

Additionally, by adding the AND np.code != '01', it will immediately exclude those entries thus only LEAVING those desired in your query. So this internal "PreQuery" (aliased PQ) does all the work for you. However, your group by does not include the import bundle ID and MAY through a false answer to you. Adjust as needed. Then the result of that is only pulling out two columns... the count and total which would return multiple rows, but no context to the account or totals, but you can adjust as you see fit.

SELECT PQ.count,
       PQ.total
  FROM (SELECT STRAIGHT_JOIN
            t.account_number,
            COUNT(t.id) count,
            SUM(t.amount) total,
            ib.id import_bundle_Id
         FROM 
            transaction_code tc
               join account_transaction t
                  on tc.id = t.transaction_import_id
               join transaction_import ti
                  on t.transaction_import_id = ti.id
               join generic_import gi
                  on ti.generic_import_id = gi.id
               join import_bundle ib
                  on gi.import_bundle_id = ib.id
               join account_import ai 
                  on gi.id = ai.generic_import_id
               join account a
                  on ai.id = a.account_import_id
               join account_northway_product anp
                  on a.id = anp.account_id
               join northway_product np
                  on anp.northway_product_id = np.id
                  AND np.code != '01'
         where
            tc.code in ( 0, 20, 40 )
         group by 
            t.account_number ) PQ

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425251

Put the check into the main query:

SELECT  t.account_number,
        COUNT(t.id) count,
        SUM(t.amount) total,
        ib.id import_bundle_id
FROM    transaction_code tc
JOIN    account_transaction t
ON      t.transaction_code_id = tc.id
JOIN    transaction_import ti
ON      ti.id = t.transaction_import_id
JOIN    generic_import gi
ON      gi.id = ti.generic_import_id
JOIN    import_bundle ib
ON      ib.id = gi.import_bundle_id
WHERE   tc.code IN (0, 20, 40)
        AND t.account_number NOT IN
        (
        SELECT  anp.id
        FROM    account_northway_product anp
        JOIN    northway_product np
        ON      np.id = anp.northway_product_id
        WHERE   np.code = '01'
        )
GROUP BY
        t.account_number

Create the following indexes:

transaction_code (code)
account_transaction (transaction_code_id)
account_transaction (account_number)

Upvotes: 1

MatBailie
MatBailie

Reputation: 86706

It appears likely that MySQL is incorrectly optimising the query due to the overall complexity.

What you may find is the best option is to run it in stages.

SELECT * INTO #query1 FROM <query1>
SELECT * INTO #query2 FROM <query2>
SELECT * FROM #query1 INNER JOIN #query2 ON <predicate>


Another option may be to use HINTS, though I'm not familiar with them in MySQL. Essentially, find out the plans being generated for each subquery individually, then force the different joins to use NESTED LOOP JOIN, MERGE JOIN, etc. This restricts what the optimiser can do, and therefore get 'wrong', but also limits it's ability to respond as the data statistics change.

Upvotes: 0

mevdiven
mevdiven

Reputation: 1902

Based on what I see from you query;

1) Seems like your PK is account number on import_bundle table. You need clustered index on that column 2) Adding index on (code) northway_product table and (code) on transaction_code table would help as well.

Upvotes: 0

Related Questions