Reputation: 60
Hi I am having one table with approximately one billion rows. Doing ORDER BY on indexed field is taking around 3 seconds to fetch 30 records with LIMIT, whereas without ORDER BY it takes 195ms. I want to speed up this.
can anyone help me out with this ?
here is the simplified version of query (have removed some fields and few joins).
SELECT DISTINCT `auctions_opportunity`.`id`,
`auctions_opportunity`.`employer_id`,
`auctions_opportunity`.`salary`,
`auctions_opportunity`.`is_active`,
`auctions_opportunity`.`is_interested`,
`auctions_opportunity`.`interview_status`,
`auctions_opportunity`.`previous_interview_status`,
`auctions_opportunity`.`creation_source`,
`auctions_opportunity`.`created_at`,
`auctions_opportunity`.`last_modified`,
`auctions_opportunity`.`last_instant_alert_email_at`,
`auctions_opportunity`.`last_daily_alert_email_at`,
`auctions_opportunity`.`last_periodic_alert_email_at`,
`auctions_opportunity`.`reviewed_at`,
`auctions_opportunity`.`last_modified_by_id`,
`auctions_opportunity`.`candidate_id`,
`auctions_opportunity`.`job_id`,
`auctions_opportunity`.`interview_request_notes`,
`auctions_opportunity`.`application_email_at`,
`auctions_opportunity`.`batch_application_email_at`,
`auctions_opportunity`.`is_location_match`,
`auctions_opportunity`.`is_strong_match`,
`auctions_opportunity`.`score`,
`auctions_opportunity`.`es_score`,
`auctions_opportunity`.`message`,
`auctions_opportunity`.`es_maybe`,
`candidates_candidate`.`id`,
`candidates_candidate`.`user_id`,
`candidates_candidate`.`phone`,
`candidates_candidate`.`is_new`,
`candidates_candidate`.`last_seen`,
`candidates_candidate`.`last_agreed_to_terms_at`,
`candidates_candidate`.`email_backend_status`,
`candidates_candidate`.`email_suppressed_at`,
`candidates_candidate`.`email_verified_at`,
`candidates_candidate`.`number_verified_at`,
`candidates_candidate`.`last_emailed_at`,
`candidates_candidate`.`last_updated`,
`candidates_candidate`.`internal_note`,
`candidates_candidate`.`main_skills`,
`candidates_candidate`.`main_skills_nopunc`,
`candidates_candidate`.`total_experience`,
`candidates_candidate`.`current_company`,
`candidates_candidate`.`current_company_nopunc`,
`candidates_candidate`.`current_designation`,
`candidates_candidate`.`onboarding_completed_at`,
`candidates_candidate`.`previously_onboarded`,
`candidates_candidate`.`talent_advocate_id`,
`candidates_candidate`.`job_function_skills`,
`candidates_candidate`.`availability`,
`candidates_candidate`.`deactivated_at`,
`candidates_candidate`.`deactivated_by_id`,
`candidates_candidate`.`deactivation_source`,
`candidates_candidate`.`is_private`,
`candidates_candidate`.`previous_companies`,
`candidates_candidate`.`companies_interned_at`,
`candidates_candidate`.`gender`,
`candidates_candidate`.`skype_id`,
`candidates_candidate`.`alternate_phone`,
`candidates_candidate`.`shadow_linkedin`,
`candidates_candidate`.`last_job_post_email`,
`candidates_candidate`.`last_job_alert_email`,
`candidates_candidate`.`last_joining_email_sent`,
`candidates_candidate`.`last_hired_email_sent`,
`candidates_candidate`.`last_reonboarding_email`,
`candidates_candidate`.`last_indexed_at`,
`candidates_candidate`.`resume_viewed_notification_type`,
`candidates_candidate`.`last_resume_viewed_email_at`,
`candidates_candidate`.`seen_go_premium_modal_at`,
`candidates_candidate`.`seen_active_check_modal_at`,
`candidates_candidate`.`alerts_limit_reached_at`,
`candidates_candidate`.`calculation_done_at`,
`candidates_candidate`.`calculation_attempted_at`,
`candidates_candidate`.`bio`,
`candidates_candidate`.`last_seen_activity_at`,
`candidates_candidate`.`job_unsubscribed_at`,
`candidates_candidate`.`monthly_alerts_unsubscribed_at`,
`candidates_candidate`.`resume_views_unsubscribed_at`,
`candidates_candidate`.`update_preferences_unsubscribed_at`,
`candidates_candidate`.`onboarding_reminder_unsubscribed_at`,
`candidates_candidate`.`is_hireable`,
`candidates_candidate`.`recruiter_message_push_unsubscribed_at`,
`candidates_candidate`.`resume_views_email_unsubscribed_at`,
`candidates_candidate`.`resume_views_push_unsubscribed_at`,
`candidates_candidate`.`profile_reminder_email_unsubscribed_at`,
`candidates_candidate`.`profile_reminder_push_unsubscribed_at`,
`candidates_candidate`.`newsletter_email_unsubscribed_at`,
`candidates_candidate`.`newsletter_push_unsubscribed_at`,
`candidates_candidate`.`product_updates_email_unsubscribed_at`,
`candidates_candidate`.`product_updates_push_unsubscribed_at`,
`candidates_candidate`.`push_notifications_shown_at`,
`candidates_candidate`.`push_notifications_enabled`,
`candidates_candidate`.`push_notifications_verified_at`,
`candidates_candidate`.`whatsapp_number`,
`candidates_candidate`.`whatsapp_enabled`,
`candidates_candidate`.`whatsapp_verified_at`,
`candidates_candidate`.`last_whatsapp_sent_at`,
`auth_user`.`id`,
`auth_user`.`password`,
`auth_user`.`last_login`,
`auth_user`.`is_superuser`,
`auth_user`.`username`,
`auth_user`.`first_name`,
`auth_user`.`last_name`,
`auth_user`.`email`,
`auth_user`.`is_staff`,
`auth_user`.`is_active`,
`auth_user`.`date_joined`
FROM `auctions_opportunity`
INNER JOIN `employers_employer` ON (`auctions_opportunity`.`employer_id` = `employers_employer`.`id`)
INNER JOIN `jobs_job` ON (`auctions_opportunity`.`job_id` = `jobs_job`.`id`)
INNER JOIN `profiles_profilestatus` ON (`employers_employer`.`status_id` = `profiles_profilestatus`.`id`)
INNER JOIN `candidates_candidate` ON (`auctions_opportunity`.`candidate_id` = `candidates_candidate`.`id`)
INNER JOIN `auth_user` ON (`candidates_candidate`.`user_id` = `auth_user`.`id`)
INNER JOIN `candidates_resume` ON (`candidates_candidate`.`id` = `candidates_resume`.`candidate_id`)
LEFT OUTER JOIN `candidates_jobsearchpreferences` ON (`candidates_candidate`.`id` = `candidates_jobsearchpreferences`.`candidate_id`)
LEFT OUTER JOIN `candidates_customaction` ON (`auctions_opportunity`.`id` = `candidates_customaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_emailaction` ON (`auctions_opportunity`.`id` = `candidates_emailaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_saveaction` ON (`auctions_opportunity`.`id` = `candidates_saveaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hideaction` ON (`auctions_opportunity`.`id` = `candidates_hideaction`.`opportunity_id`)
LEFT OUTER JOIN `candidates_hireaction` ON (`auctions_opportunity`.`id` = `candidates_hireaction`.`opportunity_id`)
WHERE (`auctions_opportunity`.`employer_id` = 4
AND NOT (`auctions_opportunity`.`job_id` IS NULL)
AND `profiles_profilestatus`.`name` = Approved
AND NOT (`auth_user`.`email` = [email protected])
AND NOT (`candidates_candidate`.`availability` = 3)
AND NOT (`auctions_opportunity`.`job_id` IS NULL)
AND NOT (`candidates_resume`.`id` IS NULL)
AND (`jobs_job`.`is_active` = TRUE
AND `auctions_opportunity`.`is_active` = TRUE)
AND ((((`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_candidate`.`last_seen` >= 2020-12-18 09:19:42.873898
AND `candidates_jobsearchpreferences`.`status` = 0)
OR (`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.873989
AND `candidates_jobsearchpreferences`.`status` = 0))))
AND `candidates_candidate`.`is_private` = FALSE)
OR `auctions_opportunity`.`interview_status` = 1
OR NOT (`auctions_opportunity`.`id` IN
(SELECT U0.`id` AS Col1
FROM `auctions_opportunity` U0
LEFT OUTER JOIN `candidates_emailaction` U1 ON (U0.`id` = U1.`opportunity_id`)
WHERE (U1.`reply_email_at` IS NULL
AND U0.`id` = (`auctions_opportunity`.`id`)))))
AND NOT (`auctions_opportunity`.`candidate_id` IN
(SELECT U2.`candidate_id` AS Col1
FROM `candidates_blockedemployerintermediate` U2
WHERE U2.`employer_id` = 4))
AND `auctions_opportunity`.`job_id` IN (40729)
AND NOT (((`auctions_opportunity`.`creation_source` = 8
AND `auctions_opportunity`.`creation_source` IS NOT NULL)
OR (`auctions_opportunity`.`interview_status` = 1
AND `auctions_opportunity`.`is_active` = FALSE)))
AND ((((`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
AND `candidates_jobsearchpreferences`.`status` = 0)))
OR (`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_jobsearchpreferences`.`status` = 2))
AND `candidates_candidate`.`is_private` = FALSE
AND `auctions_opportunity`.`interview_status` = 0
AND (`auctions_opportunity`.`is_location_match` = TRUE
OR ((`candidates_jobsearchpreferences`.`current_location` LIKE %Delhi%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Noida%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Gurgaon%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Faridabad%
OR `candidates_jobsearchpreferences`.`current_location` LIKE %Greater Noida%)
AND (`jobs_job`.`locations` LIKE %Delhi%
OR `jobs_job`.`locations` LIKE %Noida%
OR `jobs_job`.`locations` LIKE %Gurgaon%
OR `jobs_job`.`locations` LIKE %Faridabad%
OR `jobs_job`.`locations` LIKE %Greater Noida%)
AND `jobs_job`.`accept_outstation` = FALSE)))
OR (((`candidates_candidate`.`is_hireable` = TRUE
AND (NOT (`candidates_jobsearchpreferences`.`status` = 0
AND `candidates_jobsearchpreferences`.`status` IS NOT NULL)
OR (`candidates_candidate`.`last_seen` < 2020-12-18 09:19:42.863410
AND `candidates_jobsearchpreferences`.`status` = 0)))
OR (`candidates_candidate`.`is_hireable` = TRUE
AND `candidates_jobsearchpreferences`.`status` = 2))
AND `auctions_opportunity`.`reviewed_at` < 2020-12-18 09:19:42.863329
AND `auctions_opportunity`.`interview_status` = 1))
AND `auctions_opportunity`.`employer_id` = 4
AND `candidates_customaction`.`id` IS NULL
AND `candidates_emailaction`.`id` IS NULL
AND `candidates_saveaction`.`id` IS NULL
AND (`candidates_hideaction`.`id` IS NULL
OR `candidates_hideaction`.`is_deleted` = TRUE)
AND `candidates_hireaction`.`id` IS NULL
AND `auctions_opportunity`.`employer_id` = 4)
ORDER BY `auctions_opportunity`.`reviewed_at` DESC
LIMIT 30
Upvotes: 0
Views: 743
Reputation: 142296
While it may seem like the ORDER BY
is to blame, it may well be other things causing poor performance.
OR
in WHERE
and ON
should be avoided where practical.
NOT x IN ( SELECT ... )
should be turned into either NOT EXISTS ( SELECT 1 ... )
or LEFT JOIN ( SELECT y ... ) ... WHERE y IS NULL
Some possible indexes to help:
auctions_opportunity: INDEX(employer_id, interview_status, job_id)
candidates_saveaction: INDEX(opportunity_id, id)
profiles_profilestatus: INDEX(name)
candidates_jobsearchpreferences: INDEX(status)
U1: INDEX(reply_email_at)
Based on big version of query
The 6 LEFT JOINs
that constitute part of the filtering are probably part of the performance problem. Can the column being tested be moved into the main table?
There about two dozen ORs
. Each inhibits the possibility of an index on whatever is being tested. For most of them, I don't have a constructive suggestion.
status = 0
implies status IS NOT NULL
. I suspect some of the complex boolean expressions can be simplified.
Having a column that could be either NULL
or 0
, but interpreting those two 'values' leads to unnecessarily complex queries. Pick either NULL
or 0
, then map both to the picked value as the data is being inserted.
There are redundant tests such as AND NOT (
auctions_opportunity.
job_id IS NULL)
. Sure, the Optimizer might filter out the extra effort, but let's not trust it.
Can the flags is_hireable
, is_private
, availability
be combined in some way that both makes the query simpler and still provides the details?
AND NOT (
auth_user.
email = [email protected])
is a syntax error -- are there other errors? (Quotes missing)
Instead of several LIKE '%...%'
, a single RLIKE '...|...'
would probably be faster.
Do you need candidates_resume
? It does not seem to be used.
The use of U1
seems strange. The LEFT JOIN
is ON opportunity_id
, yet the IS NULL
test is on a different column (reply_email_at
). This may not work as you expect.
Some of the following indexes may help.
auctions_opportunity: INDEX(employer_id, job_id, reviewed_at)
jobs_job: INDEX(is_active, locations, accept_outstation, id)
profiles_profilestatus: INDEX(name, id)
candidates_jobsearchpreferences: INDEX(candidate_id, status, current_location, candidate_id)
blobinfotech: INDEX(com)
U1: INDEX(opportunity_id, reply_email_at)
U2: INDEX(employer_id, candidate_id)
Change
AND NOT (`auctions_opportunity`.`candidate_id` IN (
SELECT U2.`candidate_id` AS Col1
FROM `candidates_blockedemployerintermediate` U2
WHERE U2.`employer_id` = 4)
)
to
AND NOT EXISTS( SELECT 1
FROM `candidates_blockedemployerintermediate` U2
WHERE `auctions_opportunity`.`candidate_id` = U2.`candidate_id`
AND U2.`employer_id` = 4 )
Upvotes: 1
Reputation: 94939
The general answer is rather easy. If you take a telephone book and are asked to name thirty telephone numbers (LIMIT 30
without an ORDER BY
) where the person's street name contains an A, how long would that take you? A few minutes probably. Now, you are asked to name the thirty lowest telephone numbers (LIMIT 30
with an ORDER BY
) where the person's street name contains an A. You'll have to read the whole book, pick out all numbers where the street name matches, then sort all entries by number and then take the first thirty. How long will that take in comparison?
So, it's not at all surprising that you see a significant difference with and without using ORDER BY
. It's expected.
In your case the DBMS may just pick rows with is_location_match = TRUE
and thus avoid the costly location searches with LIKE
for instance (if this suffices to find 30 rows in the end). It can't do so, when it must consider all matches in order to find your top 30 rows specified by ORDER BY
.
Back to the telephone book: If the telephone book has only 50 entries, the job won't be too hard. If it has a million entries, searching it will take ages. So it must be our objective to deal with a small data set. You are joining the table candidates_resume
without using it. (The condition AND NOT candidates_resume.id IS NULL
is superfluous and makes no sense.) You are also joining all candidates_hideaction
rows where is_deleted = TRUE
. Let's say there are three resumes and five deleted hide actions per candidate. Thus you are blowing up your data set by the factor of 15 (3 x 5) only to dismiss the duplicates later with DISTINCT
. DISTINCT
is a very costly operation on large data sets. It is also a typical indicator for a badly written query, as in your case: It is unnecessary joins that blow up your intermediate result, and rather than fixing the join problem you are applying DISTINCT
in the end.
You are also using many anti joins (e.g. LEFT OUTER JOIN candidates_customaction
... WHERE candidates_customaction.id IS NULL
). This had me confused at first. I wondered why you are cross joining all those candidate actions until I found the IS NULL
conditions. I consider these anti joins not readable. And now let's say that there ten actions for each of the four anti-joined action types per candidate. If the DBMS does the join, it will create 10000 (10 x 10 x 10 x 10) rows for a single candidate that must all get dismissed. I'd opt for NOT EXISTS
clauses instead to get this more readable and maybe avoid an unnecessarily large intermediate result. I'd use NOT EXISTS
for the check for undeleted hide actions, too.
Having said this, my recommendation is that you change the query such that you can dismiss DISTINCT
. This can save the DBMS a lot of work and may show in the final execution speed.
There are some minor things you could do to get the query more readable. You don't have to state it thrice that you want employer_id = 4
:-) Then, status = 0 AND status IS NOT NULL
is just status = 0
of course. And NOT (status = 0 AND status IS NOT NULL) OR (last_seen < xxx AND status = 0)
is hence just status <> 0 OR last_seen < xxx
. This won't speed up the query, but it will make it more readable and maintainable, and maybe you even wanted to consider status NULL and made a mistake that gets more obvious in the process of cleaning the conditions up.
Upvotes: 1