GustavoNogueira
GustavoNogueira

Reputation: 399

How optimze complex query django 1.8

I have a query taking about 36 seconds to run, this is unviable, I've tried using select_related and prefetch, but the results don't improve.

This my query:

job_list = Job.objects.filter(Q(hiring_manager=emp) | Q(
                        followers=emp)).values_list('pk', flat=True)

emp is:

emp = Employee.objects.filter(
                        user=self.request.user, firm=obj).first().pk

Then i use in:

_candidates = Candidate.objects.filter(
                        job__in=job_list,
                        apply_date__range=[date_initial, date_end]
                    ).order_by('-apply_date')

And ater:

_candidates.count()

This SQL:

SQL Executed:
SELECT ...
FROM `combo_candidate` WHERE (
    `combo_candidate`.`job_id` IN (
        SELECT ... FROM `combo_job` U0
        LEFT OUTER JOIN `combo_job_followers` U2 ON U0.`id` = U2.`job_id`
        WHERE U0.`hiring_manager_id` = 9482 OR U2.`user_id` = 9482
          AND `combo_candidate`.`apply_date` BETWEEN '2019-01-01 02:00:00' AND '2019-02-01 02:00:00'
    )
Time
33402,8768539 ms

hiring_manager is a FK in Job, followers is m2m in model Job

Upvotes: 2

Views: 152

Answers (2)

Amit
Amit

Reputation: 2090

Excellent suggestion by @WillemVanOnsem. I am just attaching the Django documentation.

As per the Django documentation

Performance considerations

Be cautious about using nested queries and understand your database server’s performance characteristics (if in doubt, benchmark!). Some database backends, most notably MySQL, don’t optimize nested queries very well. It is more efficient, in those cases, to extract a list of values and then pass that into the second query. That is, execute two queries instead of one:


values = Blog.objects.filter(
        name__contains='Cheddar').values_list('pk', flat=True)

entries = Entry.objects.filter(blog__in=list(values))

Note the list() call around the Blog QuerySet to force execution of the first query. Without it, a nested query would be executed, because QuerySets are lazy.

I place special emphasis on the "Note the list() call around the Blog QuerySet to force execution of the first query. Without it, a nested query would be executed, because QuerySets are lazy."

Hope it helps

Upvotes: 2

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476503

I think the reason that this takes so long is because the database runs the subquery for each element in your combo_candidate table, and thus if the number of candidates is high, then it will make a lot of subqueries.

The subquery is however "static": it does not require to be evaluated multiple times, since nothing in the subquery refers to the world "outside" the query. We thus can make two queries here: one where evaluate the subquery, and one where we evaluate the outer query.

We can force evaluation, for example by using list(..):

_candidates = Candidate.objects.filter(
    job__in=list(job_list),
    apply_date__range=[date_initial, date_end]
).order_by('-apply_date')

Upvotes: 2

Related Questions