Reputation: 3501
I have this query:
SELECT DISTINCT
t1.`signature_id` AS id1,
t2.`signature_id` AS id2,
COUNT(DISTINCT t3.serial) AS weight
FROM `gc_con_sig` AS t1
INNER JOIN `gc_con_sig` AS t2
ON ((t1.`signature_id` != t2.`signature_id`)
AND (t1.`petition_id` = t2.`petition_id`))
INNER JOIN `wtp_data_petitions` AS t3
ON (t3.`serial` = t1.`petition_serial`)
GROUP BY t1.`signature_id`, t2.`signature_id`
HAVING weight > 0;
It essentially get the permutations of signature_ids, and the number of petitions they've both signed (weight).
That I'm trying to run against this table (gc_con_sig):
`petition_id` varchar(64) NOT NULL DEFAULT '' COMMENT 'Petition ID defined by API',
`signature_id` varchar(34) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`petition_serial` int(11) DEFAULT NULL,
KEY `signature_id` (`signature_id`),
KEY `petition_id` (`petition_id`),
KEY `signature_petition_idx` (`signature_id`,`petition_id`),
KEY `pcidx` (`petition_id`,`signature_id`),
KEY `sig_pet_ser_idx` (`petition_serial`)
This is the explain I get:
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | t1 | ALL | petition_id,pcidx,sig_pet_ser_idx | NULL | NULL | NULL | 200659 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY | 4 | wtp.t1.petition_serial | 1 | Using index |
| 1 | SIMPLE | t2 | ref | petition_id,pcidx | pcidx | 194 | wtp.t1.petition_id | 5016 | Using where; Using index |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
I've optimized the various mysql configurations mysqltuner has told me to, but this query doesn't run (at least within an hour) on a machine with 17GB ram (12GB allocated to mysql).
Any ideas?
Upvotes: 1
Views: 54
Reputation: 1269993
Can signatures be on multiple petitions? Can serial
be NULL
?
Assuming the answers are "no" to both questions, you might try:
SELECT t1.`signature_id` AS id1, t2.`signature_id` AS id2,
COUNT(*) AS weight
FROM `gc_con_sig` t1 INNER JOIN
`gc_con_sig` t2
ON (t1.`signature_id` != t2.`signature_id`) AND
(t1.`petition_id` = t2.`petition_id`)
GROUP BY t1.`signature_id`, t2.`signature_id`;
The count(distinct serial)
is counting the non-NULL values in the field. If all values are not NULL
and there are no duplicates, then this is equivalent to count(*)
.
The having clause is not needed because the on
clause basically guarantees that there is at least one match.
And, finally, select distinct
is never needed when you are using a group by
correctly.
Upvotes: 1