Jon
Jon

Reputation: 786

optimizing a MySQL query

I have a MySQL database with 3 tables

Keywords
id, keyword, projects_id

Year
results_id*, jan, feb, mar, ..., nov, dec

Results
id, keywords_id*, country, user

star = foreign key

I need to find the total yearly volume generated by any keyword that exist in project 1 and also exists in projects 2,3 and 4 grouped by country and by user.

This is what i have so far

SELECT SUM(y.january + 
           y.february +
           ...
           y.december) AS 'sum',
       r.country , r.user
       FROM results r, year y           
       WHERE y.results_id = r.id 
             AND keywords_id IN(
                                   SELECT DISTINCT k.id
                                   FROM keywords k 
                                   JOIN keywords kk ON k.keyword = kk.keyword
                                   WHERE k.projects_id = 1 
                                   AND kk.projects_id IN (2,3,4)
                                   )
       GROUP BY country, user;

My logic is this:

  1. Find the ids of all the keywords that belong to project 1 that are also in projects 2,3,4 with a JOIN.
  2. Then find any result with a keywords_id entry containing any of those keyword ids
  3. finally add all the volumes together from the year table and group by country and user

i tried the subquery for step one (the one within the parenthesis) and it returns over 1700 keywords

but when i try the whole keyword after 30 mins i get no results.

How can i rewrite the query to speed it up, or if i am doing something wrong what is it?

thanks a lot in advance

Upvotes: 0

Views: 46

Answers (1)

DRapp
DRapp

Reputation: 48139

Here is how I would write it. The first query gets all keywords that must be in all 4 projects 1, 2, 3 and 4 via qualifier of project 1 keywords and JOIN to other for 2, 3 and 4. If you want a minimum of project 1 and ANY of 2, 3 or 4, I would change it slightly.

From that, only then join to results and to the years table. Now, to help on optimization. Your keywords table should have an index on ( id, projects_id ). Results table should have an index on (keywords_id, country, user ) -- the country and user to help in the group by.

select STRAIGHT_JOIN
      r.country,
      r.user,
      SUM( y.january + y.february + y.march
         + y.april   + y.may      + y.june
         + y.july    + y.august   + y.september
         + y.october + y.november + y.december ) as AllMonths
   from 
      ( SELECT k.id
           FROM keywords k
              JOIN keywords k2 on k.id = k2.id AND k2.project_id = 2
              JOIN keywords k3 on k.id = k3.id AND k3.project_id = 3
              JOIN keywords k4 on k.id = k4.id AND k4.project_id = 4
           where 
              k.project_id = 1 ) KeywordsInAll             
         JOIN results r
            ON KeywordsInAll.ID = r.keywords_id
            JOIN `year` y
               ON r.id = y.results_id
   group by 
      r.country,
      r.user

I changed the inner query to START with keywords AT LEAST from project 1... No sense in querying 10,000 keywords if project 1 only has 15 keywords (exaggerated, but example)

From your posted comment, I would just change the prequery to almost exact as yours, but keep in first position and retain the "STRAIGHT_JOIN"

      ( SELECT DISTINCT k.id
           FROM keywords k
              JOIN keywords k2 on k.id = k2.id 
               AND k2.project_id IN (2, 3, 4 )
           where 
              k.project_id = 1 ) KeywordsInAll             

Upvotes: 1

Related Questions