Reputation: 67
So, I have a database with 5 million records and it stores the information for a question and answer site. The structure is...
question, qid, quserid, answer, auserid;
(qid is the id number for the question)
I'm trying to find all the users that have questions NOT answered by a particular user and sorting it by the amount of questions that weren't answered by a particular user. Here's my query:
SELECT quserid, COUNT(quserid)
FROM `qanda`
WHERE qid NOT IN (SELECT qid FROM `qanda` WHERE auserid='myusername' GROUP BY qid)
GROUP BY quserid
ORDER BY COUNT(quserid) DESC
LIMIT 0,1000;
The problem: it's taking over 2 hours and the clock is still ticking away! Anyone know how to speed this database up? I think there's a bug in the database or something, usually it only takes me 30 seconds for a simple query at most, so this is getting a bit ridiculous. Anyone know any tweaks? Possibly a simple change to the config file or something?
..........
Here's some data from the database I just copied and pasted. Sorry for the lack of formatting.
you could have any one person in the entire wor... greendaystud ive got the person i want...its great...because sh... •glitter•rock• 191437 If you could have any one person in the entire wor... just~another~slave2tears i already got em
•glitter•rock• 191437 If you could have any one person in the entire wor... korn_chick2007 matt or chris... i have feelings for them
•glitter•rock• 189555 why are you so sexy?
just~another~slave2tears my b/f says i am...i dun tink so tho
•glitter•rock• 189555 why are you so sexy?
korn_chick2007 im not
•glitter•rock• 189555 why are you so sexy?
MyKool-AidsSexy i dont think i am
†brokengirl† 115228 If you are supposed to expect the unexpected,
doe... death-tone yip
†brokengirl† 115228 If you are supposed to expect the unexpected,
doe... _doieverknowwhoiam_ you know whats weird? my friend sandy says that a ...
†brokengirl† 115228 If you are supposed to expect the unexpected,
doe... Cute_Physco_kitty Pretty much..
†brokengirl† 115228 If you are supposed to expect the unexpected,
doe... Leslie02 WHAT! OK, now im confused!
†brokengirl† 114995 Why does my listerine taste like sausage this
mor... death-tone what's listerine?
†brokengirl† 114995 Why does my listerine taste like sausage this
mor... _doieverknowwhoiam_ i don't know, and maybe it's jut me bu...
†brokengirl† 114995 Why does my listerine taste like sausage this
mor... darksunofdeath How old is the listerine pack?
†brokengirl† 114995 Why does my listerine taste like sausage this
mor... Cute_Physco_kitty uhh... New brand of Listerine?
†brokengirl† 114995 Why does my listerine taste like sausage this
mor... Leslie02 did you have sausage for breakfast? †brokengirl† 104305 What should I name my pinky toe on my left
foot?¿... death-tone "Pinkytoe"
And, the expected output, using convenient column titles...
Questioner User ID | Number of questions asked by the Questioner that were unanswered by 'myuserid'
Greenbay Packer | 6
DollyDoll | 63
PsychoticPokemon | 62
HelloKitty | 61
GreenDayFan | 60
...
IDontAskManyQuestion | 2<br>
WhatsAQuestion? | 1<br>
And here's the EXPLAIN output
> mysql-> EXPLAIN
> ->
> -> SELECT quserid, COUNT(quserID)
> -> FROM `qanda`
> -> WHERE qid NOT IN (SELECT qid FROM `qanda` WHERE auserid='boxocereal' GROU P BY qid)
> -> GROUP BY quserid
> -> ORDER BY COUNT(quserid) DESC
> -> LIMIT 0,1000;
>
+----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len |
> ref | rows | Extra |
> +----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ | 1 | PRIMARY | qanda | ALL | NULL | NULL | NULL |
> NULL | 3167995 | Using where; Using temporary; Using filesort | | 2
> | DEPENDENT SUBQUERY | qanda | ALL | NULL | NULL | NULL |
> NULL | 3167995 | Using where; Using temporary; Using filesort |
> +----+--------------------+-------+------+---------------+------+---------+-----
> -+---------+----------------------------------------------+ 2 rows in set (0.02 sec)
>
> mysql->
Upvotes: 0
Views: 689
Reputation: 18290
Okay, I have an idea for translating this into a JOIN instead of a subquery. The technique is to actually JOIN where there is and answer from that user (since that is radically more efficient), and then just exclude those questions from the final results (in HAVING). It is possible that this could be improved further (I haven't tested, but the IS NULL check could possibly be moved into a WHERE clause instead).
Does this get you where you want to be?
SELECT question.quserid, COUNT(question.quserid) as num_questions
FROM qanda as question
LEFT OUTER JOIN qanda as answers
ON question.qid = answers.qid AND answers.auserid = 'myusername'
GROUP BY question.quserid
ORDER BY num_questions DESC
HAVING answers.auserid IS NULL;
EDIT: a bit more explanation, in case this is close and someone else can help refine the idea into a solution.
Basically, the main part of the query will select every question exactly once (LEFT OUTER), and then for the questions which the user of interest has answered, it will also select the JOIN'd columns. This brings us to a state of have all questions in the candidate result set, with 'flags' attached to the ones we are not interested in (i.e. non-null joined columns). Then we group and reject the rows with non-null join'd data.
Upvotes: 2
Reputation: 115510
Try this one, too:
SELECT quserid, COUNT(*) AS num
FROM qanda
WHERE qid NOT IN
( SELECT qid
FROM qanda
WHERE auserid = 'user2'
)
GROUP BY quserid
ORDER BY num DESC
LIMIT 0,1000 ;
You'll definitely need to add indices for any of these variants to be efficient.
Is the (qid, quserid, auserid)
the PRIMARY
key? Is it UNIQUE
(I guess it should be one of the two.
An index on (quserid, qid)
would also help (if your table uses the InnoDB engine).
Upvotes: 0
Reputation: 31627
With reference to @ctrahey answer I have changed query and got solution working as @ctrahey was not giving output as yours as shown here.
Could you try with and let me know if this is faster?
SELECT question.quserid, COUNT(question.quserid) as num_questions
FROM qanda as question
LEFT OUTER JOIN qanda as answers
ON question.qid = answers.qid AND answers.auserid = 'user2'
WHERE answers.auserid IS NULL
GROUP BY question.quserid
ORDER BY num_questions DESC;
Also see sqlfiddle. Your query and above query are giving same output.
Hope this is what you wanted.
Upvotes: 0
Reputation: 11
Based on the output , try to find out the candidates where you can implement indexing if there is a full table scan is happening .
Upvotes: 0