Vipin Singh
Vipin Singh

Reputation: 542

Query Optimization for large database

Hi i am in need of help to optimize a query for large database records above 1 Millions . Current query is taking 27-30 seconds to execute.

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
owner_user.first_name AS ownerFirstName,
owner_user.last_name AS ownerLastName,
CONCAT(owner_user.last_name,
        owner_user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user AS owner_user ON candidate.owner = owner_user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

is there any method to reduce the execution time of the query. I have also added the index in tables but it is not working fine.

Indexes

Upvotes: 2

Views: 154

Answers (4)

Rick James
Rick James

Reputation: 142298

  1. Get rid of saved_list_entry, it adds nothing.

  2. Delay joining to user. This will let you get rid of the GROUP BY, which is adding a bunch of time, and possibly inflating the value of FOUND_ROWS().

Something like:

SELECT  c2.*,
        ou.first_name AS ownerFirstName,
        ou.last_name AS ownerLastName,
        CONCAT(ou.last_name, ou.first_name) AS ownerSort,
    FROM  
      ( SELECT  SQL_CALC_FOUND_ROWS
                c.candidate_id AS candidateID, c.candidate_id AS exportID,
                c.is_hot AS isHot, c.date_modified AS dateModifiedSort,
                c.date_created AS dateCreatedSort, c.first_name AS firstName,
                c.last_name AS lastName, c.city AS city, c.state AS state,
                c.key_skills AS keySkills,
                DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
                DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
                c.email2 AS email2
            FROM  candidate AS c
            WHERE  is_active = 1
            GROUP BY  c.candidate_id
            ORDER BY  c.date_modified DESC  -- note change here
            LIMIT  0 , 15 
      ) AS c2
    LEFT JOIN  user AS ou  ON c2.owner = ou.user_id;

(I messed up the column order, but you can fix that.)

Index needed:

candidate:  INDEX(is_active, candidate_id, date_modified)

Upvotes: 1

DRapp
DRapp

Reputation: 48139

First, a candidate, I would suspect is always a single entry by the ID, so why you are doing a GROUP BY is beyond me, that can PROBABLY be removed and improve a little.

Second, you are doing a left-join to the "saved_list_entry" table, but not actually pulling any columns from it, so this could probably be removed completely.

Third, from consideration of the GROUP BY being no longer applicable, I would suggest updating your indexes as:

table             index
CANDIDATE         ( is_active, date_modified, candidate_id, owner )
user              ( user_id )
saved_list_entry  ( data_item_id, data_item_type )

Since your order is by the date modified in descending order, having IT in the second position to the is_active (Where condition), it will plow through your first 15 quickly. However, your SQL_CALC_FOUND_ROWS will still have to plow through all other qualifying, but the result set would be pre-ordered by the index to match.

SELECT SQL_CALC_FOUND_ROWS
      c.candidate_id AS candidateID,
      c.candidate_id AS exportID,
      c.is_hot AS isHot,
      c.date_modified AS dateModifiedSort,
      c.date_created AS dateCreatedSort,
      c.first_name AS firstName,
      c.last_name AS lastName,
      c.city AS city,
      c.state AS state,
      c.key_skills AS keySkills,
      u.first_name AS ownerFirstName,
      u.last_name AS ownerLastName,
      CONCAT(u.last_name, u.first_name) AS ownerSort,
      DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
      DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
      c.email2 AS email2 
   FROM
      candidate c
         LEFT JOIN user u
            ON c.owner = u.user_id
         LEFT JOIN saved_list_entry s
            ON c.candidate_id = s.data_item_id
            AND s.data_item_type = 100
   WHERE
      c.is_active = 1 
   GROUP BY 
      c.candidate_id 
   ORDER BY    
      c.date_modified DESC 
   LIMIT 
      0, 15

Upvotes: 1

Andrews B Anthony
Andrews B Anthony

Reputation: 1381

I've changed changed the table alias in the below query, use this This Must Solve Your Problem

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
user.first_name AS ownerFirstName,
user.last_name AS ownerLastName,
CONCAT(user.last_name,
        user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user ON candidate.owner = user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

use the below queries to create indexes for join conditions

create index index_user user(user_id);

create index index_saved_list_entry saved_list_entry(data_item_type,data_item_id);

create index index_candidate candidate(is_active,candidate_id,dateModifiedSort);

Upvotes: 1

O. Jones
O. Jones

Reputation: 108651

You're using the query pattern

     SELECT a vast bunch of stuff
       FROM a complex assembly of JOIN operations
      ORDER BY some variable DESC
      LIMIT 0,small number

This is inherently inefficient: to satisfy your query the MySQL server must construct a vast result set, then it must sort the whole thing, then it takes the first fifteen rows and discards the rest.

To make this more efficient, you need to sort less stuff. Here's a way to do that. It looks like you want to find the most recently modified fifteen candidates. This query will, fairly cheaply, retrieve the IDs of those candidates. It exploits one of your indexes.

                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15

Then, you can use that as a subquery in your main query. Add a clause like this:

  WHERE candidate.candidate_id IN (
                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15)

to your query in the appropriate place.

Notice also that you're using a nonstandard and potentially harmful MySQL specific extension to GROUP BY. Your query works, but if a candidate has more than one owner, it will return just one after selecting it randomly.

Finally, you seem to have placed single-column indexes on many of the columns in your large table. This is a notorious SQL antipattern: all those indexes slow down INSERT and UPDATE operations, and most of them probably serve no purpose in speeding up a query. Certainly, for this query, the only useful indexes are the one on date_modified and the primary key.

Many complex queries are satisfied best using specific multi-column indexes. A bunch of single-column indexes are no help to such queries.

Upvotes: 1

Related Questions