NMO
NMO

Reputation: 766

ManyToMany slow in Django

I have two simple models:

class A(models.Model):
name_a = models.CharField(
    _("name_a"),
    max_length=255)

    b = models.ManyToManyField(
        'B',
        related_name='a',
        blank=True
    )

class B(models.Model):
    name_b = models.CharField(
        _("name_b"),
        max_length=255)

I created 1000 records for A:

for i in range(1000):
    A.objects.create()

and 3 objects for B:

for i in range(3):
    B.objects.create()

and connect each Object of b with each object of a through the m2m relation:

for a in A.objects.all().iterator(): 
    a.b.add(B.objects.all()[0]) 
    a.b.add(B.objects.all()[1]) 
    a.b.add(B.objects.all()[2])

Now for each a object, I want to get all b objects:

import time 
start = time.time()
objects = A.objects.all()
for n in objects.iterator(): 
     list(n.b.all())
print(time.time() - start)

Output: 2.642864465713501

So it takes over 2 seconds for this query for just 1000 a objects. The performance is horrible. I have over 1000000 a objects in production.

I tried to increase the performance by using prefetch_related:

import time 
start = time.time()
objects = A.objects.all().prefetch_related('b')
for n in objects.iterator(): 
     list(n.b.all())
print(time.time() - start)

Output: 2.684298038482666

But that helps nothing. Why is it so slow and how can I improve the performance?

Upvotes: 3

Views: 3363

Answers (2)

j-i-l
j-i-l

Reputation: 10957

When running this I'm around 0.7 sec with sqlite. Time reduces by 50% if I omit the list creation for every query.

Point is that you hit the DB as many times as you have A objects. So the best bet for performance increase is reducing the number of queries you make. But here it actually matters what exactly you intend to do. As this in unclear, from here on it's more or less just guessing what could work for you...

Maybe just iterate through A.objects.values_list('id') and query B instead as you don't really use the a objects:

bq = B.objects.all()
for a_id, in A.objects.values_list('id').iterator():
    list(bq.filter(b__id=a_id))  # maybe correct your related_name to 'a' so this would look a__id=a_id
print(time.time() - start)

or

start = time.time()
a_ids = A.objects.values_list('id').all()
bq = B.objects.all()
[list(bq.filter(b__id=a_id)) for a_id, in a_ids]
print(time.time() - start)

If you just want all b objects related to some of your a objects, say, for example, all with name_a contains 'foo':

B.objects.filter(b__name_a__contains='foo').all().distinct()

Hope these suggestions might help

Upvotes: 2

Iain Shelvington
Iain Shelvington

Reputation: 32244

From the Django docs for prefetch_related

Note that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two optimizations do not make sense together.

Upvotes: 0

Related Questions