Lucas Miller
Lucas Miller

Reputation: 123

Django prefetch_related optimize query but still very slow

I'm experiencing some severe performances issues with prefetch_related on a Model with 5 m2m fields and I'm pre-fetching also few nested m2m fields.

class TaskModelManager(models.Manager):
    def get_queryset(self):
        return super(TaskModelManager, self).get_queryset().exclude(internalStatus=2).prefetch_related("parent", "takes", "takes__flags", "assignedUser", "assignedUser__flags", "asset", "asset__flags", "status", "approvalWorkflow", "viewers", "requires", "linkedTasks", "activities")


class Task(models.Model):
    uuid = models.UUIDField(primary_key=True, default=genOptimUUID, editable=False)
    internalStatus = models.IntegerField(default=0)
    parent = models.ForeignKey("self", blank=True, null=True, related_name="childs")
    name = models.CharField(max_length=45)
    taskType = models.ForeignKey("TaskType", null=True)
    priority = models.IntegerField()
    startDate = models.DateTimeField()
    endDate = models.DateTimeField()
    status = models.ForeignKey("ProgressionStatus")
    assignedUser = models.ForeignKey("Asset", related_name="tasksAssigned")
    asset = models.ForeignKey("Asset", related_name="tasksSubject")
    viewers = models.ManyToManyField("Asset", blank=True, related_name="followedTasks")
    step = models.ForeignKey("Step", blank=True, null=True, related_name="tasks")
    approvalWorkflow = models.ForeignKey("ApprovalWorkflow")
    linkedTasks = models.ManyToManyField("self", symmetrical=False, blank=True, related_name="linkedTo")
    requires = models.ManyToManyField("self", symmetrical=False, blank=True, related_name="depends")

    objects = TaskModelManager()

The number of query is fine and the database query time is fine too, for exemple if I query 700 objects of my model i have 35 query and the average query time is 100~200ms but the total request time is approximately 8 seconds.

silk times

I've run some profiling and it pointed out that more than 80% of the time spent was on the prefetch_related_objects call.

profiling

I'm using Django==1.8.5 and djangorestframework==3.4.6

I'm open to any way to optimize this. Thanks in advance for your help.


Edit with select_related:

I've tried the improvement proposed by Alasdair

class TaskModelManager(models.Manager):
    def get_queryset(self):
        return super(TaskModelManager, self).get_queryset().exclude(internalStatus=2).select_related("parent", "status", "approvalWorkflow", "step").prefetch_related("takes", "takes__flags", "assignedUser", "assignedUser__flags", "asset", "asset__flags", "viewers", "requires", "linkedTasks", "activities")

The new result is still 8 seconds for the request with 32 queries and 150ms of query time.


Edit :

It seems that a ticket was opened on Django issue tracker 4 years ago and is still open : https: //code.djangoproject.com/ticket/20577

Upvotes: 9

Views: 7793

Answers (3)

Claudio Catterina
Claudio Catterina

Reputation: 337

I ran into the same problem.

Following the issue you linked i found that you can improve the prefetch_related performance using Prefetch object and to_attr argument.

From the commit that introduces the Prefetch object:

When a Prefetch instance specifies a to_attr argument, the result is stored in a list rather than a QuerySet. This has the fortunate consequence of being significantly faster. The preformance improvement is due to the fact that we save the costly creation of a QuerySet instance.

So i significantly improved my code (from about 7 seconds to 0.88 seconds) simply by calling:

Foo.objects.filter(...).prefetch_related(Prefetch('bars', to_attr='bars_list'))

instead of

Foo.objects.filter(...).prefetch_related('bars')

Upvotes: 14

Apollo Data
Apollo Data

Reputation: 1426

If the DB is 150ms but your request is 8 seconds, it's not your query (in itself, at least). A few possible issues:

1) Your HTML or template is too complex, spending too much time in generating the response. Or consider template caching.

2) All those objects are complex and you load too many fields, so while the query is fast, sending it and processing all those objects in Python is slow. Explore using only(), defer() and values() or value_list() to only load what you need.

Optimization is hard and we'd need more details to give you a better idea. I'd suggest installing Django Debug Toolbar (Django app) or Opbeat (3rd party utility), they can help you detect where your time is spent and then you can optimize accordingly.

Upvotes: -1

Alasdair
Alasdair

Reputation: 309039

Try using select_related for foreign keys like parent and ApprovalWorkflow instead of prefetch_related.

When you use select_related, Django will fetch the models using a join, unlike prefetch_related which causes an extra query. You might find that this improves performance.

Upvotes: 1

Related Questions