deep adeshra
deep adeshra

Reputation: 60

How to improve performance of ORDER BY in mysql?

Hi I am having one table with approximately one billion rows. Doing ORDER BY on indexed field is taking around 3 seconds to fetch 30 records with LIMIT, whereas without ORDER BY it takes 195ms. I want to speed up this.

can anyone help me out with this ?

here is the simplified version of query (have removed some fields and few joins).

SELECT DISTINCT `auctions_opportunity`.`id`,
                `auctions_opportunity`.`employer_id`,
                `auctions_opportunity`.`salary`,
                `auctions_opportunity`.`is_active`,
                `auctions_opportunity`.`is_interested`,
                `auctions_opportunity`.`interview_status`,
                `auctions_opportunity`.`previous_interview_status`,
                `auctions_opportunity`.`creation_source`,
                `auctions_opportunity`.`created_at`,
                `auctions_opportunity`.`last_modified`,
                `auctions_opportunity`.`last_instant_alert_email_at`,
                `auctions_opportunity`.`last_daily_alert_email_at`,
                `auctions_opportunity`.`last_periodic_alert_email_at`,
                `auctions_opportunity`.`reviewed_at`,
                `auctions_opportunity`.`last_modified_by_id`,
                `auctions_opportunity`.`candidate_id`,
                `auctions_opportunity`.`job_id`,
                `auctions_opportunity`.`interview_request_notes`,
                `auctions_opportunity`.`application_email_at`,
                `auctions_opportunity`.`batch_application_email_at`,
                `auctions_opportunity`.`is_location_match`,
                `auctions_opportunity`.`is_strong_match`,
                `auctions_opportunity`.`score`,
                `auctions_opportunity`.`es_score`,
                `auctions_opportunity`.`message`,
                `auctions_opportunity`.`es_maybe`,
                `candidates_candidate`.`id`,
                `candidates_candidate`.`user_id`,
                `candidates_candidate`.`phone`,
                `candidates_candidate`.`is_new`,
                `candidates_candidate`.`last_seen`,
                `candidates_candidate`.`last_agreed_to_terms_at`,
                `candidates_candidate`.`email_backend_status`,
                `candidates_candidate`.`email_suppressed_at`,
                `candidates_candidate`.`email_verified_at`,
                `candidates_candidate`.`number_verified_at`,
                `candidates_candidate`.`last_emailed_at`,
                `candidates_candidate`.`last_updated`,
                `candidates_candidate`.`internal_note`,
                `candidates_candidate`.`main_skills`,
                `candidates_candidate`.`main_skills_nopunc`,
                `candidates_candidate`.`total_experience`,
                `candidates_candidate`.`current_company`,
                `candidates_candidate`.`current_company_nopunc`,
                `candidates_candidate`.`current_designation`,
                `candidates_candidate`.`onboarding_completed_at`,
                `candidates_candidate`.`previously_onboarded`,
                `candidates_candidate`.`talent_advocate_id`,
                `candidates_candidate`.`job_function_skills`,
                `candidates_candidate`.`availability`,
                `candidates_candidate`.`deactivated_at`,
                `candidates_candidate`.`deactivated_by_id`,
                `candidates_candidate`.`deactivation_source`,
                `candidates_candidate`.`is_private`,
                `candidates_candidate`.`previous_companies`,
                `candidates_candidate`.`companies_interned_at`,
                `candidates_candidate`.`gender`,
                `candidates_candidate`.`skype_id`,
                `candidates_candidate`.`alternate_phone`,
                `candidates_candidate`.`shadow_linkedin`,
                `candidates_candidate`.`last_job_post_email`,
                `candidates_candidate`.`last_job_alert_email`,
                `candidates_candidate`.`last_joining_email_sent`,
                `candidates_candidate`.`last_hired_email_sent`,
                `candidates_candidate`.`last_reonboarding_email`,
                `candidates_candidate`.`last_indexed_at`,
                `candidates_candidate`.`resume_viewed_notification_type`,
                `candidates_candidate`.`last_resume_viewed_email_at`,
                `candidates_candidate`.`seen_go_premium_modal_at`,
                `candidates_candidate`.`seen_active_check_modal_at`,
                `candidates_candidate`.`alerts_limit_reached_at`,
                `candidates_candidate`.`calculation_done_at`,
                `candidates_candidate`.`calculation_attempted_at`,
                `candidates_candidate`.`bio`,
                `candidates_candidate`.`last_seen_activity_at`,
                `candidates_candidate`.`job_unsubscribed_at`,
                `candidates_candidate`.`monthly_alerts_unsubscribed_at`,
                `candidates_candidate`.`resume_views_unsubscribed_at`,
                `candidates_candidate`.`update_preferences_unsubscribed_at`,
                `candidates_candidate`.`onboarding_reminder_unsubscribed_at`,
                `candidates_candidate`.`is_hireable`,
                `candidates_candidate`.`recruiter_message_push_unsubscribed_at`,
                `candidates_candidate`.`resume_views_email_unsubscribed_at`,
                `candidates_candidate`.`resume_views_push_unsubscribed_at`,
                `candidates_candidate`.`profile_reminder_email_unsubscribed_at`,
                `candidates_candidate`.`profile_reminder_push_unsubscribed_at`,
                `candidates_candidate`.`newsletter_email_unsubscribed_at`,
                `candidates_candidate`.`newsletter_push_unsubscribed_at`,
                `candidates_candidate`.`product_updates_email_unsubscribed_at`,
                `candidates_candidate`.`product_updates_push_unsubscribed_at`,
                `candidates_candidate`.`push_notifications_shown_at`,
                `candidates_candidate`.`push_notifications_enabled`,
                `candidates_candidate`.`push_notifications_verified_at`,
                `candidates_candidate`.`whatsapp_number`,
                `candidates_candidate`.`whatsapp_enabled`,
                `candidates_candidate`.`whatsapp_verified_at`,
                `candidates_candidate`.`last_whatsapp_sent_at`,
                `auth_user`.`id`,
                `auth_user`.`password`,
                `auth_user`.`last_login`,
                `auth_user`.`is_superuser`,
                `auth_user`.`username`,
                `auth_user`.`first_name`,
                `auth_user`.`last_name`,
                `auth_user`.`email`,
                `auth_user`.`is_staff`,
                `auth_user`.`is_active`,
                `auth_user`.`date_joined`
FROM `auctions_opportunity`
INNER JOIN `employers_employer` ON (`auctions_opportunity`.`employer_id` = `employers_employer`.`id`)
INNER JOIN `jobs_job` ON (`auctions_opportunity`.`job_id` = `jobs_job`.`id`)
INNER JOIN `profiles_profilestatus` ON (`employers_employer`.`status_id` = `profiles_profilestatus`.`id`)
INNER JOIN `candidates_candidate` ON (`auctions_opportunity`.`candidate_id` = `candidates_candidate`.`id`)
INNER JOIN `auth_user` ON (`candidates_candidate`.`user_id` = `auth_user`.`id`)
INNER JOIN `candidates_resume` ON (`candidates_candidate`.`id` = `candidates_resume`.`candidate_id`)
LEFT OUTER JOIN `candidates_jobsearchpreferences` ON (`candidates_candidate`.`id` = `candidates_jobsearchpreferences`.`candidate_id`)
LEFT OUTER JOIN `candidates_customaction` ON (`auctions_opportunity`.`id` = `candidates_customaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_emailaction` ON (`auctions_opportunity`.`id` = `candidates_emailaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_saveaction` ON (`auctions_opportunity`.`id` = `candidates_saveaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hideaction` ON (`auctions_opportunity`.`id` = `candidates_hideaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hireaction` ON (`auctions_opportunity`.`id` = `candidates_hireaction`.`opportunity_id`)
WHERE (`auctions_opportunity`.`employer_id` = 4
       AND NOT (`auctions_opportunity`.`job_id` IS NULL)
       AND `profiles_profilestatus`.`name` = Approved
       AND NOT (`auth_user`.`email` = [email protected])
       AND NOT (`candidates_candidate`.`availability` = 3)
       AND NOT (`auctions_opportunity`.`job_id` IS NULL)
       AND NOT (`candidates_resume`.`id` IS NULL)
       AND (`jobs_job`.`is_active` = TRUE
            AND `auctions_opportunity`.`is_active` = TRUE)
       AND ((((`candidates_candidate`.`is_hireable` = TRUE
               AND `candidates_candidate`.`last_seen` >= 2020-12-18 09:19:42.873898
               AND `candidates_jobsearchpreferences`.`status` = 0)
              OR (`candidates_candidate`.`is_hireable` = TRUE
                  AND (NOT (`candidates_jobsearchpreferences`.`status` = 0)
                       OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.873989
                           AND `candidates_jobsearchpreferences`.`status` = 0))))
             AND `candidates_candidate`.`is_private` = FALSE)
            OR `auctions_opportunity`.`interview_status` = 1
            OR NOT (`auctions_opportunity`.`id` IN
                      (SELECT U0.`id` AS Col1
                       FROM `auctions_opportunity` U0
                       LEFT OUTER JOIN `candidates_emailaction` U1 ON (U0.`id` = U1.`opportunity_id`)
                       WHERE (U1.`reply_email_at` IS NULL
                              AND U0.`id` = (`auctions_opportunity`.`id`)))))
       AND NOT (`auctions_opportunity`.`candidate_id` IN
                  (SELECT U2.`candidate_id` AS Col1
                   FROM `candidates_blockedemployerintermediate` U2
                   WHERE U2.`employer_id` = 4))
       AND `auctions_opportunity`.`job_id` IN (40729)
       AND NOT (((`auctions_opportunity`.`creation_source` = 8
                  AND `auctions_opportunity`.`creation_source` IS NOT NULL)
                 OR (`auctions_opportunity`.`interview_status` = 1
                     AND `auctions_opportunity`.`is_active` = FALSE)))
       AND ((((`candidates_candidate`.`is_hireable` = TRUE
               AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
                         AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
                    OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
                        AND `candidates_jobsearchpreferences`.`status` = 0)))
              OR (`candidates_candidate`.`is_hireable` = TRUE
                  AND `candidates_jobsearchpreferences`.`status` = 2))
             AND `candidates_candidate`.`is_private` = FALSE
             AND `auctions_opportunity`.`interview_status` = 0
             AND (`auctions_opportunity`.`is_location_match` = TRUE
                  OR ((`candidates_jobsearchpreferences`.`current_location` LIKE %Delhi%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Noida%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Gurgaon%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Faridabad%
                       OR `candidates_jobsearchpreferences`.`current_location` LIKE %Greater Noida%)
                      AND (`jobs_job`.`locations` LIKE %Delhi%
                           OR `jobs_job`.`locations` LIKE %Noida%
                           OR `jobs_job`.`locations` LIKE %Gurgaon%
                           OR `jobs_job`.`locations` LIKE %Faridabad%
                           OR `jobs_job`.`locations` LIKE %Greater Noida%)
                      AND `jobs_job`.`accept_outstation` = FALSE)))
            OR (((`candidates_candidate`.`is_hireable` = TRUE
                  AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
                            AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
                       OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
                           AND `candidates_jobsearchpreferences`.`status` = 0)))
                 OR (`candidates_candidate`.`is_hireable` = TRUE
                     AND `candidates_jobsearchpreferences`.`status` = 2))
                AND `auctions_opportunity`.`reviewed_at` < 2020-12-18 09:19:42.863329
                AND `auctions_opportunity`.`interview_status` = 1))
       AND `auctions_opportunity`.`employer_id` = 4
       AND `candidates_customaction`.`id` IS NULL
       AND `candidates_emailaction`.`id` IS NULL
       AND `candidates_saveaction`.`id` IS NULL
       AND (`candidates_hideaction`.`id` IS NULL
            OR `candidates_hideaction`.`is_deleted` = TRUE)
       AND `candidates_hireaction`.`id` IS NULL
       AND `auctions_opportunity`.`employer_id` = 4)
ORDER BY `auctions_opportunity`.`reviewed_at` DESC
LIMIT 30

Upvotes: 0

Views: 743

Answers (2)

Rick James
Rick James

Reputation: 142296

While it may seem like the ORDER BY is to blame, it may well be other things causing poor performance.

  • OR in WHERE and ON should be avoided where practical.

  • NOT x IN ( SELECT ... ) should be turned into either NOT EXISTS ( SELECT 1 ... ) or LEFT JOIN ( SELECT y ... ) ... WHERE y IS NULL

  • Some possible indexes to help:

      auctions_opportunity:  INDEX(employer_id, interview_status, job_id)
      candidates_saveaction:  INDEX(opportunity_id, id)
      profiles_profilestatus:  INDEX(name)
      candidates_jobsearchpreferences:  INDEX(status)
      U1:  INDEX(reply_email_at)
    

Based on big version of query

  • The 6 LEFT JOINs that constitute part of the filtering are probably part of the performance problem. Can the column being tested be moved into the main table?

  • There about two dozen ORs. Each inhibits the possibility of an index on whatever is being tested. For most of them, I don't have a constructive suggestion.

  • status = 0 implies status IS NOT NULL. I suspect some of the complex boolean expressions can be simplified.

  • Having a column that could be either NULL or 0, but interpreting those two 'values' leads to unnecessarily complex queries. Pick either NULL or 0, then map both to the picked value as the data is being inserted.

  • There are redundant tests such as AND NOT (auctions_opportunity.job_id IS NULL). Sure, the Optimizer might filter out the extra effort, but let's not trust it.

  • Can the flags is_hireable, is_private, availability be combined in some way that both makes the query simpler and still provides the details?

  • AND NOT (auth_user.email = [email protected]) is a syntax error -- are there other errors? (Quotes missing)

  • Instead of several LIKE '%...%', a single RLIKE '...|...' would probably be faster.

  • Do you need candidates_resume? It does not seem to be used.

  • The use of U1 seems strange. The LEFT JOIN is ON opportunity_id, yet the IS NULL test is on a different column (reply_email_at). This may not work as you expect.

  • Some of the following indexes may help.

      auctions_opportunity:  INDEX(employer_id, job_id, reviewed_at)
      jobs_job:  INDEX(is_active, locations, accept_outstation, id)
      profiles_profilestatus:  INDEX(name, id)
      candidates_jobsearchpreferences:  INDEX(candidate_id, status, current_location, candidate_id)
      blobinfotech:  INDEX(com)
      U1:  INDEX(opportunity_id, reply_email_at)
      U2:  INDEX(employer_id,  candidate_id)
    
  • Change

                AND  NOT (`auctions_opportunity`.`candidate_id` IN (
                          SELECT  U2.`candidate_id` AS Col1
                              FROM  `candidates_blockedemployerintermediate` U2
                              WHERE  U2.`employer_id` = 4)
                         )
    

to

     AND NOT EXISTS( SELECT 1 
              FROM `candidates_blockedemployerintermediate` U2
              WHERE `auctions_opportunity`.`candidate_id` =  U2.`candidate_id`
                AND  U2.`employer_id` = 4 )

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 94939

The general answer is rather easy. If you take a telephone book and are asked to name thirty telephone numbers (LIMIT 30 without an ORDER BY) where the person's street name contains an A, how long would that take you? A few minutes probably. Now, you are asked to name the thirty lowest telephone numbers (LIMIT 30 with an ORDER BY) where the person's street name contains an A. You'll have to read the whole book, pick out all numbers where the street name matches, then sort all entries by number and then take the first thirty. How long will that take in comparison?

So, it's not at all surprising that you see a significant difference with and without using ORDER BY. It's expected.

In your case the DBMS may just pick rows with is_location_match = TRUE and thus avoid the costly location searches with LIKE for instance (if this suffices to find 30 rows in the end). It can't do so, when it must consider all matches in order to find your top 30 rows specified by ORDER BY.

Back to the telephone book: If the telephone book has only 50 entries, the job won't be too hard. If it has a million entries, searching it will take ages. So it must be our objective to deal with a small data set. You are joining the table candidates_resume without using it. (The condition AND NOT candidates_resume.id IS NULL is superfluous and makes no sense.) You are also joining all candidates_hideaction rows where is_deleted = TRUE. Let's say there are three resumes and five deleted hide actions per candidate. Thus you are blowing up your data set by the factor of 15 (3 x 5) only to dismiss the duplicates later with DISTINCT. DISTINCT is a very costly operation on large data sets. It is also a typical indicator for a badly written query, as in your case: It is unnecessary joins that blow up your intermediate result, and rather than fixing the join problem you are applying DISTINCT in the end.

You are also using many anti joins (e.g. LEFT OUTER JOIN candidates_customaction ... WHERE candidates_customaction.id IS NULL). This had me confused at first. I wondered why you are cross joining all those candidate actions until I found the IS NULL conditions. I consider these anti joins not readable. And now let's say that there ten actions for each of the four anti-joined action types per candidate. If the DBMS does the join, it will create 10000 (10 x 10 x 10 x 10) rows for a single candidate that must all get dismissed. I'd opt for NOT EXISTS clauses instead to get this more readable and maybe avoid an unnecessarily large intermediate result. I'd use NOT EXISTS for the check for undeleted hide actions, too.

Having said this, my recommendation is that you change the query such that you can dismiss DISTINCT. This can save the DBMS a lot of work and may show in the final execution speed.

There are some minor things you could do to get the query more readable. You don't have to state it thrice that you want employer_id = 4 :-) Then, status = 0 AND status IS NOT NULL is just status = 0 of course. And NOT (status = 0 AND status IS NOT NULL) OR (last_seen < xxx AND status = 0) is hence just status <> 0 OR last_seen < xxx. This won't speed up the query, but it will make it more readable and maintainable, and maybe you even wanted to consider status NULL and made a mistake that gets more obvious in the process of cleaning the conditions up.

Upvotes: 1

Related Questions