Langer
Langer

Reputation: 97

MySQL Distinct from subQuery

MOVED TO: MySQL Distinct performance

Basic Idea:

1) I have a Mysql Server with lots of data: 9 tables linked all with foreign keys in more or less linear way.

2) With a GUI I want to extract some results: There are shown these 9 tables and only one variable for each table. Lets say:

Now by marking in table->Frequency->20 the database should check every other table if there are entries which are measured with Freq 20 and update all tables depending on the 20.

BUT: I only want to show distinct values in every table. And this distinct takes 17s, which is very poor for a GUI to wait for.

Example Code:

SELECT wafer.ID
FROM product
  JOIN chip ON chip.product_name=product.name
  JOIN wafer ON wafer.ID = chip.wafer_ID
  JOIN lot ON lot.ID = wafer.lot_ID
  JOIN ROI ON ROI.ID_string = chip.ROI_ID
  JOIN result ON result.chip_ID = chip.ID_string
  JOIN setup ON setup.ID_md5 = result.setup_ID 
  JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")

Duration: 0.34 s fetch: 17 s (1.5e6 rows)

Explain:

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  product     const   PRIMARY,name_UNIQUE PRIMARY 137 const   1   100.00  Using index
1   SIMPLE  dataset     index   PRIMARY,ID_UNIQUE   ID_UNIQUE   137     501 100.00  Using index
1   SIMPLE  result      ref dataset-result_idx,chip_ID_idx,setupID  dataset-result_idx  137 databaseName.dataset.ID_md5 159 100.00  
1   SIMPLE  setup       eq_ref  PRIMARY PRIMARY 137 databaseName.result.setup_ID    1   100.00  Using index
1   SIMPLE  chip        eq_ref  PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID   PRIMARY 452 databaseName.result.chip_ID 1   49.99   Using where
1   SIMPLE  ROI     eq_ref  PRIMARY,ID_UNIQUE   PRIMARY 302 databaseName.chip.ROI_ID    1   100.00  Using index
1   SIMPLE  wafer       eq_ref  PRIMARY,waferID_UNIQUE,number   PRIMARY 62  databaseName.chip.wafer_ID  1   100.00  
1   SIMPLE  lot     eq_ref  PRIMARY,lotnumber_UNIQUE    PRIMARY 62  databaseName.wafer.lot_ID   1   100.00  Using index

SELECT distinct wafer.ID {...same code as before}

Duration: 23 s fetch: 0.000 s (54 rows)

Explain:

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  product     const   PRIMARY,name_UNIQUE PRIMARY 137 const   1   100.00  Using index; Using temporary
1   SIMPLE  dataset     index   PRIMARY,ID_UNIQUE   ID_UNIQUE   137     501 100.00  Using index
1   SIMPLE  result      ref dataset-result_idx,chip_ID_idx,setupID  dataset-result_idx  137 databaseName.dataset.ID_md5 159 100.00  
1   SIMPLE  setup       eq_ref  PRIMARY PRIMARY 137 databaseName.result.setup_ID    1   100.00  Using index
1   SIMPLE  chip        eq_ref  PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID   PRIMARY 452 databaseName.result.chip_ID 1   49.99   Using where
1   SIMPLE  ROI     eq_ref  PRIMARY,ID_UNIQUE   PRIMARY 302 databaseName.chip.ROI_ID    1   100.00  Using index
1   SIMPLE  wafer       eq_ref  PRIMARY,waferID_UNIQUE,number   PRIMARY 62  databaseName.chip.wafer_ID  1   100.00  
1   SIMPLE  lot     eq_ref  PRIMARY,lotnumber_UNIQUE    PRIMARY 62  databaseName.wafer.lot_ID   1   100.00  Using index; Distinct

I really wonder why this distinct takes so long. All rows here have indices. This example only shows the code for one table. But I need 9 updating tables.

Is there any way to speed up this process or this "select distinct" query?

Btw: I'm not really capable of understanding the explain. If there is a big hint I wouldn't see it...

database

Upvotes: 0

Views: 112

Answers (2)

Rick James
Rick James

Reputation: 142298

Most of those tables do not prove anything to the query. Remove lot, dataset and perhaps some more. OTOH, one thing they may be providing is, for example, whether there is a "lot" for the item. That is, won't this give you the desired answer?

SELECT  DISTINCT wafer.ID
    FROM  product
    JOIN  chip  ON chip.product_name = product.name
    JOIN  wafer  ON wafer.ID = chip.wafer_ID
    WHERE  product.name IN ("GoodProduct")

These indexes might help, if you don't already have them:

product:  (name)
result:  (dataset_ID, setup_ID, chip_ID)
dataset:  (ID_md5)
setup:  (ID_md5)
chip:  (ID_string, ROI_ID, wafer_ID, product_name)
ROI:  (ID_string)
wafer:  (lot_ID, ID)

Upvotes: 0

slaakso
slaakso

Reputation: 9070

When asking a query performance question, you should show the tables structures and indexes so that it would be easier to help.

You are joining the 8 tables together and the sole limitation you have is that the product name has to be "GoodProduct". The product-table is joined against chip with the product_name so you should check if you have indexes on those name/product_name-columns. Depending on the number of rows in ROI and result, you might need a composite index on those.

Your query formatting is bit complex and hard to read. You simplify things by using format:

SELECT wafer.ID
FROM product
  JOIN chip ON chip.product_name=product.name
  JOIN wafer ON wafer.ID = chip.wafer_ID
  JOIN lot ON lot.ID = wafer.lot_ID
  JOIN ROI ON ROI.ID_string = chip.ROI_ID
  JOIN result ON result.chip_ID = chip.ID_string
  JOIN setup ON setup.ID_md5 = result.setup_ID 
  JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")

Note that tables lot, ROI, result, setup and dataset are in the query only for the reason that there needs to be a row on each table that matches the "GoodProduct". If this is not a requirement, you could do the query with just product, chip and wafer-tables and the performance would be considerably better.

Upvotes: 1

Related Questions