Reputation: 786
I have a MySQL database with 3 tables
Keywords
id, keyword, projects_id
Year
results_id*, jan, feb, mar, ..., nov, dec
Results
id, keywords_id*, country, user
star = foreign key
I need to find the total yearly volume generated by any keyword that exist in project 1 and also exists in projects 2,3 and 4 grouped by country and by user.
This is what i have so far
SELECT SUM(y.january +
y.february +
...
y.december) AS 'sum',
r.country , r.user
FROM results r, year y
WHERE y.results_id = r.id
AND keywords_id IN(
SELECT DISTINCT k.id
FROM keywords k
JOIN keywords kk ON k.keyword = kk.keyword
WHERE k.projects_id = 1
AND kk.projects_id IN (2,3,4)
)
GROUP BY country, user;
My logic is this:
i tried the subquery for step one (the one within the parenthesis) and it returns over 1700 keywords
but when i try the whole keyword after 30 mins i get no results.
How can i rewrite the query to speed it up, or if i am doing something wrong what is it?
thanks a lot in advance
Upvotes: 0
Views: 46
Reputation: 48139
Here is how I would write it. The first query gets all keywords that must be in all 4 projects 1, 2, 3 and 4 via qualifier of project 1 keywords and JOIN to other for 2, 3 and 4. If you want a minimum of project 1 and ANY of 2, 3 or 4, I would change it slightly.
From that, only then join to results and to the years table. Now, to help on optimization. Your keywords table should have an index on ( id, projects_id ). Results table should have an index on (keywords_id, country, user ) -- the country and user to help in the group by.
select STRAIGHT_JOIN
r.country,
r.user,
SUM( y.january + y.february + y.march
+ y.april + y.may + y.june
+ y.july + y.august + y.september
+ y.october + y.november + y.december ) as AllMonths
from
( SELECT k.id
FROM keywords k
JOIN keywords k2 on k.id = k2.id AND k2.project_id = 2
JOIN keywords k3 on k.id = k3.id AND k3.project_id = 3
JOIN keywords k4 on k.id = k4.id AND k4.project_id = 4
where
k.project_id = 1 ) KeywordsInAll
JOIN results r
ON KeywordsInAll.ID = r.keywords_id
JOIN `year` y
ON r.id = y.results_id
group by
r.country,
r.user
I changed the inner query to START with keywords AT LEAST from project 1... No sense in querying 10,000 keywords if project 1 only has 15 keywords (exaggerated, but example)
From your posted comment, I would just change the prequery to almost exact as yours, but keep in first position and retain the "STRAIGHT_JOIN"
( SELECT DISTINCT k.id
FROM keywords k
JOIN keywords k2 on k.id = k2.id
AND k2.project_id IN (2, 3, 4 )
where
k.project_id = 1 ) KeywordsInAll
Upvotes: 1