user6799565
user6799565

Reputation: 21

Optimization select group by

I have this query

SELECT 
    DC.variationID, COUNT(DISTINCT(DC.userID)) AS conversion
FROM 
    XXXX AS DC
WHERE 
    DC.testID = 'XXXX' AND DC.visit > 1
GROUP BY 
    DC.variationID

This is the description of the table

CREATE TABLE `XXXX` 
(
   `id` bigint(20) NOT NULL AUTO_INCREMENT,
   `userID` bigint(17) NOT NULL,
   `testID` bigint(20) NOT NULL,
   `variationID` bigint(20) NOT NULL,
   `url` bigint(20) NOT NULL,
   `time` bigint(20) NOT NULL,
   `visit` bigint(20) NOT NULL DEFAULT '1',
   `isTestPage` tinyint(1) NOT NULL,
   PRIMARY KEY (`id`,`testID`),
   KEY `url` (`url`),
   KEY `dc3_testIDPage` (`testID`,`url`),
   KEY `testid_istest` (`testID`,`isTestPage`),
   KEY `dc3_varIDPage` (`variationID`,`url`),
   KEY `index_rebond` (`testID`,`visit`,`variationID`),
   KEY `dc3_testIDvarIDPage` (`testID`,`variationID`,`url`),
   KEY `isTestPage2` (`variationID`,`isTestPage`,`visit`,`userID`),
   KEY `user_test_varID_url` (`userID`,`testID`,`variationID`,`url`)
 ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

This is the explain of the query

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'DC', NULL, 'ref', 'dc3_testIDPage,testid_istest,dc3_varIDPage,index_rebond,dc3_testIDvarIDPage,isTestPage2,user_test_varID_url', 'dc3_testIDvarIDPage', '8', 'const', '13695309', '33.33', 'Using index condition; Using where'

For me the query should use the index 'index_rebond' but unfortunatelly it not use it.I pretty sure that before the query use the index 'index_rebond'.

The query takes a lot of time to complete. Can you please tell me why the query doesn't use the index 'index_rebond' and what is the best way to optimize the query?

This is the result of show index

# Table, Non_unique, Key_name, Seq_in_index, Column_name, Collation, Cardinality, Sub_part, Packed, Null, Index_type, Comment, Index_comment
'datacollect_v3', '0', 'PRIMARY', '1', 'id', 'A', '25909280', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '0', 'PRIMARY', '2', 'testID', 'A', '25909280', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'url', '1', 'url', 'A', '1657369', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDPage', '1', 'testID', 'A', '2167', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDPage', '2', 'url', 'A', '1850256', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'testid_istest', '1', 'testID', 'A', '3813', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'testid_istest', '2', 'isTestPage', 'A', '5721', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_varIDPage', '1', 'variationID', 'A', '2053', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_varIDPage', '2', 'url', 'A', '4171834', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '1', 'testID', 'A', '1811', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '2', 'visit', 'A', '11357', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '3', 'variationID', 'A', '17208', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '1', 'testID', 'A', '2049', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '2', 'variationID', 'A', '3513', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '3', 'url', 'A', '929052', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '1', 'variationID', 'A', '1891', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '2', 'isTestPage', 'A', '3309', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '3', 'visit', 'A', '16172', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '4', 'userID', 'A', '2712038', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '1', 'userID', 'A', '1103566', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '2', 'testID', 'A', '1336479', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '3', 'variationID', 'A', '1325388', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '4', 'url', 'A', '16936138', NULL, NULL, '', 'BTREE', '', ''

Best regards,

Upvotes: 0

Views: 46

Answers (2)

Rick James
Rick James

Reputation: 142278

PRIMARY KEY (id, testID) does not make sense unless you are partitioning by testID. Shouldn't you have PRIMARY KEY (id)?

If you have some combination of columns that is naturally unique, consider getting rid of id and using that combo as the PK.

This will be "covering", hence probably significantly faster:

INDEX(testID, visit, variationID, userID)

(And get rid of index_rebond since it will be redundant.) isTestPage2 is also covering (because the PK is included), but it is ordered inefficiently.

Unless you really expect to have more than 4 billion of the various ids, recommend you switch them to INT UNSIGNED (here and in the other tables). Cutting this table and its indexes in half will help performance, especially if the dataset is too big to be cached.

Upvotes: 1

ScaisEdge
ScaisEdge

Reputation: 133360

Could be the query optimizer prefer dc3_testIDPage or testid_istest because they also start with testID

but you can force the use of an index

 SELECT DC.variationID, COUNT(DISTINCT(DC.userID)) AS conversion
                     FROM XXXX AS DC force index_rebond
                     WHERE DC.testID = 'XXXX' AND DC.visit > 1
                     GROUP BY DC.variationID

Upvotes: 0

Related Questions