MySQL Distinct from subQuery

Question

Basic Idea:

1) I have a Mysql Server with lots of data: 9 tables linked all with foreign keys in more or less linear way.

2) With a GUI I want to extract some results: There are shown these 9 tables and only one variable for each table. Lets say:

Table 1: Frequency: 20,40,80,100
Table 2: Wavelength: 300,400,500,600
Table 3: ....

Now by marking in table->Frequency->20 the database should check every other table if there are entries which are measured with Freq 20 and update all tables depending on the 20.

BUT: I only want to show distinct values in every table. And this distinct takes 17s, which is very poor for a GUI to wait for.

Example Code:

SELECT wafer.ID
FROM product
  JOIN chip ON chip.product_name=product.name
  JOIN wafer ON wafer.ID = chip.wafer_ID
  JOIN lot ON lot.ID = wafer.lot_ID
  JOIN ROI ON ROI.ID_string = chip.ROI_ID
  JOIN result ON result.chip_ID = chip.ID_string
  JOIN setup ON setup.ID_md5 = result.setup_ID 
  JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")

Duration: 0.34 s fetch: 17 s (1.5e6 rows)

Explain:

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  product     const   PRIMARY,name_UNIQUE PRIMARY 137 const   1   100.00  Using index
1   SIMPLE  dataset     index   PRIMARY,ID_UNIQUE   ID_UNIQUE   137     501 100.00  Using index
1   SIMPLE  result      ref dataset-result_idx,chip_ID_idx,setupID  dataset-result_idx  137 databaseName.dataset.ID_md5 159 100.00  
1   SIMPLE  setup       eq_ref  PRIMARY PRIMARY 137 databaseName.result.setup_ID    1   100.00  Using index
1   SIMPLE  chip        eq_ref  PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID   PRIMARY 452 databaseName.result.chip_ID 1   49.99   Using where
1   SIMPLE  ROI     eq_ref  PRIMARY,ID_UNIQUE   PRIMARY 302 databaseName.chip.ROI_ID    1   100.00  Using index
1   SIMPLE  wafer       eq_ref  PRIMARY,waferID_UNIQUE,number   PRIMARY 62  databaseName.chip.wafer_ID  1   100.00  
1   SIMPLE  lot     eq_ref  PRIMARY,lotnumber_UNIQUE    PRIMARY 62  databaseName.wafer.lot_ID   1   100.00  Using index

SELECT distinct wafer.ID {...same code as before}

Duration: 23 s fetch: 0.000 s (54 rows)

Explain:

id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1   SIMPLE  product     const   PRIMARY,name_UNIQUE PRIMARY 137 const   1   100.00  Using index; Using temporary
1   SIMPLE  dataset     index   PRIMARY,ID_UNIQUE   ID_UNIQUE   137     501 100.00  Using index
1   SIMPLE  result      ref dataset-result_idx,chip_ID_idx,setupID  dataset-result_idx  137 databaseName.dataset.ID_md5 159 100.00  
1   SIMPLE  setup       eq_ref  PRIMARY PRIMARY 137 databaseName.result.setup_ID    1   100.00  Using index
1   SIMPLE  chip        eq_ref  PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID   PRIMARY 452 databaseName.result.chip_ID 1   49.99   Using where
1   SIMPLE  ROI     eq_ref  PRIMARY,ID_UNIQUE   PRIMARY 302 databaseName.chip.ROI_ID    1   100.00  Using index
1   SIMPLE  wafer       eq_ref  PRIMARY,waferID_UNIQUE,number   PRIMARY 62  databaseName.chip.wafer_ID  1   100.00  
1   SIMPLE  lot     eq_ref  PRIMARY,lotnumber_UNIQUE    PRIMARY 62  databaseName.wafer.lot_ID   1   100.00  Using index; Distinct

I really wonder why this distinct takes so long. All rows here have indices. This example only shows the code for one table. But I need 9 updating tables.

Is there any way to speed up this process or this "select distinct" query?

Btw: I'm not really capable of understanding the explain. If there is a big hint I wouldn't see it...

database

slaakso · Accepted Answer

When asking a query performance question, you should show the tables structures and indexes so that it would be easier to help.

You are joining the 8 tables together and the sole limitation you have is that the product name has to be "GoodProduct". The product-table is joined against chip with the product_name so you should check if you have indexes on those name/product_name-columns. Depending on the number of rows in ROI and result, you might need a composite index on those.

Your query formatting is bit complex and hard to read. You simplify things by using format:

SELECT wafer.ID
FROM product
  JOIN chip ON chip.product_name=product.name
  JOIN wafer ON wafer.ID = chip.wafer_ID
  JOIN lot ON lot.ID = wafer.lot_ID
  JOIN ROI ON ROI.ID_string = chip.ROI_ID
  JOIN result ON result.chip_ID = chip.ID_string
  JOIN setup ON setup.ID_md5 = result.setup_ID 
  JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")

Note that tables lot, ROI, result, setup and dataset are in the query only for the reason that there needs to be a row on each table that matches the "GoodProduct". If this is not a requirement, you could do the query with just product, chip and wafer-tables and the performance would be considerably better.

MySQL Distinct from subQuery

Answers (2)

Related Questions