Reputation:
I have a fairly static (InnoDB) table T
with four columns: A
, B
, C
and D
.
I firstly wish to identify, for a given value of A
, which value(s) of B
yield unique C
across all records. My attempt is as follows:
CREATE PROCEDURE P(x int) BEGIN
SELECT B
FROM T
WHERE A = x
GROUP BY B
HAVING COUNT(DISTINCT C) = COUNT(C);
END
But introducing the GROUP BY
dramatically reduces the performance of this query, despite there being an index on column B
. Is there a more efficient way, or can I improve the peformance of this query somehow?
In response to Daan's comment below, the table was created with the following:
CREATE TABLE T (
A int(11) NOT NULL,
B varchar(45) NOT NULL,
C varchar(255) DEFAULT NULL,
D int(11) NOT NULL,
PRIMARY KEY (A,B,D),
KEY iA (A),
KEY iB (B),
KEY iC (C)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
In response to tombom's comment below, the query is explained as follows:
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+
| 1 | SIMPLE | T | ref | PRIMARY,iA | PRIMARY | 4 | const | 2603472 | Using where; Using filesort |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+
Upvotes: 0
Views: 346
Reputation: 51868
You can try various approaches:
1.) Create an index over A,B and C like this
CREATE INDEX iABC ON T(A,B,C);
Since the problem is most likely the HAVING
clause (C column as varchar(255) ain't that great in this case):
2.) Create a (temporary or not) table and then join to it. This might speed up things. A non-temporary like in the following might be faster, since you can create an index on it.
CREATE TABLE foo AS
SELECT
B,
COUNT(DISTINCT C) AS distinctC,
COUNT(C) AS countC
FROM T
GROUP BY B;
CREATE INDEX idx_b ON foo(B);
CREATE INDEX idx_cc ON foo(distinctC, countC);
SELECT T.B
FROM T
INNER JOIN foo ON T.B = foo.B
WHERE A = x
AND foo.distinctC = foo.countC
GROUP BY B
ORDER BY NULL; /*see Daan's comment*/
3.) Put the C column in a separate table, where the actual content is identified by an INT
.
CREATE TABLE T (
A int(11) NOT NULL,
B varchar(45) NOT NULL,
C int(11) DEFAULT NULL,
PRIMARY KEY (A,B),
KEY iB (B),
KEY iC (C)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE C (
id int(11) NOT NULL,
Ccontent varchar(255) DEFAULT NULL
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then do everything like usual and join later when you have your result to table C, to translate the ids with the actual varchar value.
I'd prefer option 2. And by the way, your index iA might be useless.
Upvotes: 2