anon
anon

Reputation:

Optimise (MySQL) SELECT .. GROUP BY performance

I have a fairly static (InnoDB) table T with four columns: A, B, C and D.

I firstly wish to identify, for a given value of A, which value(s) of B yield unique C across all records. My attempt is as follows:

CREATE PROCEDURE P(x int) BEGIN
    SELECT   B
    FROM     T
    WHERE    A = x
    GROUP BY B
    HAVING   COUNT(DISTINCT C) = COUNT(C);
END

But introducing the GROUP BY dramatically reduces the performance of this query, despite there being an index on column B. Is there a more efficient way, or can I improve the peformance of this query somehow?


In response to Daan's comment below, the table was created with the following:

CREATE TABLE T (
    A int(11) NOT NULL,
    B varchar(45) NOT NULL,
    C varchar(255) DEFAULT NULL,
    D int(11) NOT NULL,
    PRIMARY KEY (A,B,D),
    KEY iA (A),
    KEY iB (B),
    KEY iC (C)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

In response to tombom's comment below, the query is explained as follows:

+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows    | Extra                       |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+
| 1  | SIMPLE      | T     | ref  | PRIMARY,iA    | PRIMARY | 4       | const | 2603472 | Using where; Using filesort |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-----------------------------+

Upvotes: 0

Views: 346

Answers (2)

fancyPants
fancyPants

Reputation: 51868

You can try various approaches:

1.) Create an index over A,B and C like this

CREATE INDEX iABC ON T(A,B,C);

Since the problem is most likely the HAVING clause (C column as varchar(255) ain't that great in this case):

2.) Create a (temporary or not) table and then join to it. This might speed up things. A non-temporary like in the following might be faster, since you can create an index on it.

CREATE TABLE foo AS
SELECT 
B, 
COUNT(DISTINCT C) AS distinctC, 
COUNT(C) AS countC 
FROM T 
GROUP BY B;

CREATE INDEX idx_b ON foo(B);
CREATE INDEX idx_cc ON foo(distinctC, countC);

SELECT   T.B
FROM     T
INNER JOIN foo ON T.B = foo.B
WHERE    A = x
AND foo.distinctC = foo.countC
GROUP BY B
ORDER BY NULL; /*see Daan's comment*/

3.) Put the C column in a separate table, where the actual content is identified by an INT.

CREATE TABLE T (
    A int(11) NOT NULL,
    B varchar(45) NOT NULL,
    C int(11) DEFAULT NULL,
    PRIMARY KEY (A,B),
    KEY iB (B),
    KEY iC (C)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


CREATE TABLE C (
    id int(11) NOT NULL,
    Ccontent varchar(255) DEFAULT NULL
    PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Then do everything like usual and join later when you have your result to table C, to translate the ids with the actual varchar value.

I'd prefer option 2. And by the way, your index iA might be useless.

Upvotes: 2

Dojo
Dojo

Reputation: 5684

Why not do COUNT(DISTINCT C)=1 instead?

Upvotes: 0

Related Questions