Reputation: 107
Need to decide which query is faster:
Query1
select xyz.user, count(*) as score
from xyz join
(select qid, min(time) as mintime
from xyz
group by qid
) q
on xyz.qid = q.qid and xyz.time = q.mintime
group by xyz.user;
Query2
select user,count(*)
from (select *
from (select *
from xyz
order by time ASC
) as temp1
group by temp1.qid
) As temp2
group by temp2.user
both return the number of times that a user is "first" for each qid.
DB:
CREATE TABLE xyz (
id INT PRIMARY KEY AUTO_INCREMENT,
user VARCHAR(20),
time INT,
qid INT
);
INSERT INTO xyz VALUES ( 1 , 'abc' , 15 , 1);
INSERT INTO xyz VALUES ( 2 , 'abc' , 6 , 1);
INSERT INTO xyz VALUES ( 3 , 'xyz' , 11 , 1);
INSERT INTO xyz VALUES ( 4 , 'abc' , 4 , 1);
INSERT INTO xyz VALUES ( 5 , 'xyz' , 13 , 2);
INSERT INTO xyz VALUES ( 6 , 'abc' , 11 ,2);
INSERT INTO xyz VALUES ( 7 , 'abc' , 9 , 3);
INSERT INTO xyz VALUES ( 8 , 'xyz' , 10 , 3);
INSERT INTO xyz VALUES ( 9 , 'xyz' , 2 , 3);
INSERT INTO xyz VALUES ( 10 , 'xyz' , 2 , 4);
Edit: As suggested replaced better with faster.
Upvotes: 0
Views: 72
Reputation: 4423
This is a typical performance scenario regarding speed. As for any such circumstance you have to test and measure.
That in itself is a quite tricky task since your queries will be cached by the db engine after the first run and the following executions will be considerably faster. The test and measure of performance in a system needs you to consider a lot of variables to be done correctly.
On the other side:
My estimation is that the query with the JOIN
would be the fastest. Modern sql DB engines are good at optimization and a JOIN
would allow the engine to freely optimize the query.
Upvotes: 1
Reputation: 72177
Before anything else you should add indexes to your table on the columns you use for search or comparison:
ALTER TABLE `xyz`
ADD INDEX (`qid`),
ADD INDEX (`time`);
Then put EXPLAIN
in front of each query and check the results against the recommendations you can find in the manual. Look closely the sections about "Join Types" and "Extra Information".
With or without indexes, avoid the second query. It is the worst and it cannot be improved.
I propose you an alternative query that produces the same results and is even faster than your first approach:
SELECT uif.user, COUNT(*) AS score
FROM xyz uif # "uif" from "user is first"
LEFT JOIN xyz sm # "sm" from "smaller time"
ON uif.qid = sm.qid AND sm.time < uif.time
WHERE sm.time IS NULL # keep only when there is no "smaller time"
GROUP BY uif.user
It joins the xyz
table (aliased as uif
) against itself (aliased as sm
). Each row from uif
is paired with all the rows from sm
that have the same qid
and smaller times (sm.time < uif.time
). The LEFT JOIN
ensures that all the rows from uif
will appear in the joined set. When a row from uif
does not have a pair from sm
(because there is no row in sm
having a smaller time), the columns from sm
are filled with NULL
.
The WHERE
condition keeps only these rows from uif
that have no pair in sm
(there is no "smaller time"); that means it keeps only the rows from uif
that have the smaller times.
The SELECT
and GROUP BY
clauses take care of your original goal (return the number of times). Always put only columns from uid
in the SELECT
clauses (because those of sm
are all NULL
, of course).
Upvotes: 1
Reputation: 108641
Your first query contains this subquery.
select qid, min(time) as mintime
from xyz
group by qid
It can be made very efficient by a compound index on (qid,time)
. MySQL will satisfy that query with a so-called loose index scan. So if I were you I would go with your first query.
But you should use EXPLAIN
to make sure I am right. You should revisit this question sometime in the future when your tables have 100 times the number of rows they have now.
Upvotes: 1
Reputation: 3872
SELECT user, count(*) AS score
FROM xyz JOIN
(SELECT qid, min(time) AS mintime
FROM xyz
GROUP by qid
) q
ON qid = q.qid AND time = q.mintime
GROUP BY user;
I believe xyz.gid
and xyz.time
is not necessary
Upvotes: 0