Reputation: 97
MOVED TO: MySQL Distinct performance
Basic Idea:
1) I have a Mysql Server with lots of data: 9 tables linked all with foreign keys in more or less linear way.
2) With a GUI I want to extract some results: There are shown these 9 tables and only one variable for each table. Lets say:
Now by marking in table->Frequency->20 the database should check every other table if there are entries which are measured with Freq 20 and update all tables depending on the 20.
BUT: I only want to show distinct values in every table. And this distinct takes 17s, which is very poor for a GUI to wait for.
Example Code:
SELECT wafer.ID
FROM product
JOIN chip ON chip.product_name=product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
JOIN lot ON lot.ID = wafer.lot_ID
JOIN ROI ON ROI.ID_string = chip.ROI_ID
JOIN result ON result.chip_ID = chip.ID_string
JOIN setup ON setup.ID_md5 = result.setup_ID
JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")
Duration: 0.34 s fetch: 17 s (1.5e6 rows)
Explain:
id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1 SIMPLE product const PRIMARY,name_UNIQUE PRIMARY 137 const 1 100.00 Using index
1 SIMPLE dataset index PRIMARY,ID_UNIQUE ID_UNIQUE 137 501 100.00 Using index
1 SIMPLE result ref dataset-result_idx,chip_ID_idx,setupID dataset-result_idx 137 databaseName.dataset.ID_md5 159 100.00
1 SIMPLE setup eq_ref PRIMARY PRIMARY 137 databaseName.result.setup_ID 1 100.00 Using index
1 SIMPLE chip eq_ref PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID PRIMARY 452 databaseName.result.chip_ID 1 49.99 Using where
1 SIMPLE ROI eq_ref PRIMARY,ID_UNIQUE PRIMARY 302 databaseName.chip.ROI_ID 1 100.00 Using index
1 SIMPLE wafer eq_ref PRIMARY,waferID_UNIQUE,number PRIMARY 62 databaseName.chip.wafer_ID 1 100.00
1 SIMPLE lot eq_ref PRIMARY,lotnumber_UNIQUE PRIMARY 62 databaseName.wafer.lot_ID 1 100.00 Using index
SELECT distinct wafer.ID {...same code as before}
Duration: 23 s fetch: 0.000 s (54 rows)
Explain:
id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1 SIMPLE product const PRIMARY,name_UNIQUE PRIMARY 137 const 1 100.00 Using index; Using temporary
1 SIMPLE dataset index PRIMARY,ID_UNIQUE ID_UNIQUE 137 501 100.00 Using index
1 SIMPLE result ref dataset-result_idx,chip_ID_idx,setupID dataset-result_idx 137 databaseName.dataset.ID_md5 159 100.00
1 SIMPLE setup eq_ref PRIMARY PRIMARY 137 databaseName.result.setup_ID 1 100.00 Using index
1 SIMPLE chip eq_ref PRIMARY,ID_UNIQUE,Chip_UNIQUE,product_name_idx,ROI_ID PRIMARY 452 databaseName.result.chip_ID 1 49.99 Using where
1 SIMPLE ROI eq_ref PRIMARY,ID_UNIQUE PRIMARY 302 databaseName.chip.ROI_ID 1 100.00 Using index
1 SIMPLE wafer eq_ref PRIMARY,waferID_UNIQUE,number PRIMARY 62 databaseName.chip.wafer_ID 1 100.00
1 SIMPLE lot eq_ref PRIMARY,lotnumber_UNIQUE PRIMARY 62 databaseName.wafer.lot_ID 1 100.00 Using index; Distinct
I really wonder why this distinct takes so long. All rows here have indices. This example only shows the code for one table. But I need 9 updating tables.
Is there any way to speed up this process or this "select distinct" query?
Btw: I'm not really capable of understanding the explain. If there is a big hint I wouldn't see it...
Upvotes: 0
Views: 112
Reputation: 142298
Most of those tables do not prove anything to the query. Remove lot
, dataset
and perhaps some more. OTOH, one thing they may be providing is, for example, whether there is a "lot" for the item. That is, won't this give you the desired answer?
SELECT DISTINCT wafer.ID
FROM product
JOIN chip ON chip.product_name = product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
WHERE product.name IN ("GoodProduct")
These indexes might help, if you don't already have them:
product: (name)
result: (dataset_ID, setup_ID, chip_ID)
dataset: (ID_md5)
setup: (ID_md5)
chip: (ID_string, ROI_ID, wafer_ID, product_name)
ROI: (ID_string)
wafer: (lot_ID, ID)
Upvotes: 0
Reputation: 9070
When asking a query performance question, you should show the tables structures and indexes so that it would be easier to help.
You are joining the 8 tables together and the sole limitation you have is that the product name has to be "GoodProduct"
. The product
-table is joined against chip
with the product_name
so you should check if you have indexes on those name
/product_name
-columns. Depending on the number of rows in ROI
and result
, you might need a composite index on those.
Your query formatting is bit complex and hard to read. You simplify things by using format:
SELECT wafer.ID
FROM product
JOIN chip ON chip.product_name=product.name
JOIN wafer ON wafer.ID = chip.wafer_ID
JOIN lot ON lot.ID = wafer.lot_ID
JOIN ROI ON ROI.ID_string = chip.ROI_ID
JOIN result ON result.chip_ID = chip.ID_string
JOIN setup ON setup.ID_md5 = result.setup_ID
JOIN dataset ON dataset.ID_md5 = result.dataset_ID
WHERE product.name IN ("GoodProduct")
Note that tables lot
, ROI
, result
, setup
and dataset
are in the query only for the reason that there needs to be a row on each table that matches the "GoodProduct"
. If this is not a requirement, you could do the query with just product
, chip
and wafer
-tables and the performance would be considerably better.
Upvotes: 1