Jason Swett
Jason Swett

Reputation: 45094

MySQL query optimization and EXPLAIN for a noob

I've been working with databases for a long time but I'm new to query optimization. I have the following query (some of it code-generated):

SELECT DISTINCT COALESCE(gi.start_time, '') start_time,
COALESCE(b.name, '') bank,
COALESCE(a.id, '') account_id,
COALESCE(a.account_number, '') account_number,
COALESCE(at.code, '') account_type,
COALESCE(a.open_date, '') open_date,
COALESCE(a.interest_rate, '') interest_rate,
COALESCE(a.maturity_date, '') maturity_date,
COALESCE(a.opening_balance, '') opening_balance,
COALESCE(a.has_e_statement, '') has_e_statement,
COALESCE(a.has_bill_pay, '') has_bill_pay,
COALESCE(a.has_overdraft_protection, '') has_overdraft_protection,
COALESCE(a.balance, '') balance,
COALESCE(a.business_or_personal, '') business_or_personal,
COALESCE(a.cumulative_balance, '') cumulative_balance,
COALESCE(c.customer_number, '') customer_number,
COALESCE(c.social_security_number, '') social_security_number,
COALESCE(c.name, '') customer_name,
COALESCE(c.phone, '') phone,
COALESCE(c.deceased, '') deceased,
COALESCE(c.do_not_mail, '') do_not_mail,
COALESCE(cdob.date_of_birth, '') date_of_birth,
COALESCE(ad.line1, '') line1,
COALESCE(ad.line2, '') line2,
COALESCE(ad.city, '') city,
COALESCE(s.name, '') state,
COALESCE(ad.zip, '') zip,
COALESCE(o.officer_number, '') officer_number,
COALESCE(o.name, '') officer_name,
COALESCE(po.line1, '') po_box,
COALESCE(po.city, '') po_city,
COALESCE(po_state.name, '') po_state,
COALESCE(po.zip, '') zip,
COALESCE(br.number, '') branch_number,
COALESCE(cd_type.code, '') cd_type,
COALESCE(mp.product_number, '') macatawa_product_number,
COALESCE(mp.product_name, '') macatawa_product_name,
COALESCE(pt.name, '') macatawa_product_type,
COALESCE(hhsc.name, '') harte_hanks_service_category,
COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy,
COALESCE(cft.name, '') core_file_type,
COALESCE(oa.line1, '') original_address_line1,
COALESCE(oa.line2, '') original_address_line2,
COALESCE(uc.code, '') use_class
            FROM account a
            JOIN customer c ON a.customer_id = c.id
            JOIN officer o ON a.officer_id = o.id
            JOIN account_address aa ON aa.account_id = a.id
       LEFT JOIN account_po_box apb ON apb.account_id = a.id                
            JOIN address ad ON aa.address_id = ad.id
            JOIN original_address oa ON oa.address_id = ad.id
       LEFT JOIN address po ON apb.address_id = po.id
            JOIN state s ON s.id = ad.state_id
       LEFT JOIN state po_state ON po_state.id = po.state_id
       LEFT JOIN branch br ON a.branch_id = br.id
            JOIN account_import ai ON a.account_import_id = ai.id
            JOIN generic_import gi ON gi.id = ai.generic_import_id
            JOIN import_bundle ib ON gi.import_bundle_id = ib.id
            JOIN bank b ON b.id = ib.bank_id
       LEFT JOIN customer_date_of_birth cdob ON cdob.customer_id = c.id
       LEFT JOIN cd_type ON a.cd_type_id = cd_type.id
       LEFT JOIN account_macatawa_product amp ON amp.account_id = a.id
       LEFT JOIN macatawa_product mp ON mp.id = amp.macatawa_product_id
       LEFT JOIN product_type pt ON pt.id = mp.product_type_id
       LEFT JOIN harte_hanks_service_category hhsc
            ON hhsc.id = mp.harte_hanks_service_category_id
       LEFT JOIN core_file_type cft ON cft.id = mp.core_file_type_id
       LEFT JOIN use_class uc ON a.use_class_id = uc.id
       LEFT JOIN account_type at ON a.account_type_id = at.id

         WHERE 1
           AND gi.active = 1
           AND b.id = 8 AND ib.is_finished = 1

        ORDER BY a.id
           LIMIT 10

And it's pretty slow. On my dev server it takes about a minute to run and on my production server, where there's more data, I can't get it to even finish. Here's what an EXPLAIN looks like:

https://i.sstatic.net/eR6lq.png

I know the basics of EXPLAIN. I know that it's good that I have something other than NULL for everything under key. But I don't know, overall, how much room for improvement my query has. I do know that Using temporary; Using filesort under Extra is bad, but I have no idea what to do about it.

Upvotes: 1

Views: 3053

Answers (2)

DRapp
DRapp

Reputation: 48139

In addition to what @JNK mentioned in his answer about ensuring you have indexes, I have restructured your query and added the "STRAIGHT_JOIN" clause at the top which tells the optimizer to do the query in the order the tables are presented to it.

Since your query is based on the generic import, to import bundle to bank, I've moved THOSE to the front of the list... The where will pre-qualify THOSE records first instead of looking at all accounts that may never be part of the result. So, the join is now reversed from the generic import back to the account following the same relationships you started with.

I've also associated the respective JOIN / ON conditions directly under the table they were joining against for readability and following table relationships. I've also made it so the ON clause has Table1.ID = JoinedTable.ID... although some reversed and otherwise no big deal, knowing how something is based on the join INTO the other just allows easier readability.

So, ensure respective tables have indexes on whatever key column is the join, and from this sample query, make sure your GI table (alias) has an index on "Active", and your IB (alias) has an index on Is_Finished.

Lastly, your WHERE clause had WHERE 1 AND... no purpose of the "1", so I stripped that out.

SELECT STRAIGHT_JOIN DISTINCT 
      COALESCE(gi.start_time, '') start_time, 
      COALESCE(b.name, '') bank, 
      COALESCE(a.id, '') account_id, 
      COALESCE(a.account_number, '') account_number, 
      COALESCE(at.code, '') account_type, 
      COALESCE(a.open_date, '') open_date, 
      COALESCE(a.interest_rate, '') interest_rate, 
      COALESCE(a.maturity_date, '') maturity_date, 
      COALESCE(a.opening_balance, '') opening_balance, 
      COALESCE(a.has_e_statement, '') has_e_statement, 
      COALESCE(a.has_bill_pay, '') has_bill_pay, 
      COALESCE(a.has_overdraft_protection, '') has_overdraft_protection, 
      COALESCE(a.balance, '') balance, 
      COALESCE(a.business_or_personal, '') business_or_personal, 
      COALESCE(a.cumulative_balance, '') cumulative_balance, 
      COALESCE(c.customer_number, '') customer_number, 
      COALESCE(c.social_security_number, '') social_security_number, 
      COALESCE(c.name, '') customer_name, 
      COALESCE(c.phone, '') phone, 
      COALESCE(c.deceased, '') deceased, 
      COALESCE(c.do_not_mail, '') do_not_mail, 
      COALESCE(cdob.date_of_birth, '') date_of_birth, 
      COALESCE(ad.line1, '') line1, 
      COALESCE(ad.line2, '') line2, 
      COALESCE(ad.city, '') city, 
      COALESCE(s.name, '') state, 
      COALESCE(ad.zip, '') zip, 
      COALESCE(o.officer_number, '') officer_number, 
      COALESCE(o.name, '') officer_name, 
      COALESCE(po.line1, '') po_box, 
      COALESCE(po.city, '') po_city, 
      COALESCE(po_state.name, '') po_state, 
      COALESCE(po.zip, '') zip, 
      COALESCE(br.number, '') branch_number, 
      COALESCE(cd_type.code, '') cd_type, 
      COALESCE(mp.product_number, '') macatawa_product_number, 
      COALESCE(mp.product_name, '') macatawa_product_name, 
      COALESCE(pt.name, '') macatawa_product_type, 
      COALESCE(hhsc.name, '') harte_hanks_service_category, 
      COALESCE(mp.hoh_hierarchy, '') hoh_hierarchy, 
      COALESCE(cft.name, '') core_file_type, 
      COALESCE(oa.line1, '') original_address_line1, 
      COALESCE(oa.line2, '') original_address_line2, 
      COALESCE(uc.code, '') use_class             
   FROM 
      generic_import gi 
         JOIN import_bundle ib 
            ON gi.import_bundle_id = ib.id
            JOIN bank b 
               ON ib.bank_id = b.id 
         JOIN account_import ai 
            ON gi.id = ai.generic_import_id
         JOIN  account a
            ON ai.id = a.account_import_id
            JOIN customer c 
               ON a.customer_id = c.id
               LEFT JOIN customer_date_of_birth cdob 
                  ON c.id = cdob.customer_id
            JOIN officer o 
               ON a.officer_id = o.id
            LEFT JOIN branch br 
               ON a.branch_id = br.id
            LEFT JOIN cd_type 
               ON a.cd_type_id = cd_type.id
            LEFT JOIN account_macatawa_product amp 
               ON a.id = amp.account_id
               LEFT JOIN macatawa_product mp 
                  ON amp.macatawa_product_id = mp.id
                  LEFT JOIN product_type pt 
                     ON mp.product_type_id = pt.id
                  LEFT JOIN harte_hanks_service_category hhsc 
                     ON mp.harte_hanks_service_category_id = hhsc.id
                  LEFT JOIN core_file_type cft 
                     ON mp.core_file_type_id = cft.id
            LEFT JOIN use_class uc 
               ON a.use_class_id = uc.id
            LEFT JOIN account_type at 
               ON a.account_type_id = at.id
            JOIN account_address aa 
               ON a.id = aa.account_id 
               JOIN address ad 
                  ON aa.address_id = ad.id 
                  JOIN original_address oa 
                     ON ad.id = oa.address_id
                  JOIN state s 
                     ON ad.state_id = s.id 
            LEFT JOIN account_po_box apb 
               ON a.id = apb.account_id 
               LEFT JOIN address po 
                  ON apb.address_id = po.id
                  LEFT JOIN state po_state 
                     ON po.state_id = po_state.id
      WHERE 
              gi.active = 1
          AND ib.is_finished = 1
          AND b.id = 8 
      ORDER BY 
          a.id
       LIMIT 
          10 

Upvotes: 1

JNK
JNK

Reputation: 65157

It looks like you don't have indexes on most of your JOIN fields. Make sure every field that you use as a JOIN key has an index on both tables.

With 23 joins and what looks like only 2 relevant indexes, poor performance can be expected.

With no index to reference, the query engine is checking every row in both tables to compare them, which is obviously very inefficient.

edit:

For example, in your query you have

JOIN customer c ON a.customer_id = c.id

Make sure you have an index on a.customer_id AND customer.id. Having an index on both tables (on the JOINed fields) will exponentially speed up the query.

Upvotes: 2

Related Questions