Reputation: 766
I have two simple models:
class A(models.Model):
name_a = models.CharField(
_("name_a"),
max_length=255)
b = models.ManyToManyField(
'B',
related_name='a',
blank=True
)
class B(models.Model):
name_b = models.CharField(
_("name_b"),
max_length=255)
I created 1000 records for A:
for i in range(1000):
A.objects.create()
and 3 objects for B:
for i in range(3):
B.objects.create()
and connect each Object of b with each object of a through the m2m relation:
for a in A.objects.all().iterator():
a.b.add(B.objects.all()[0])
a.b.add(B.objects.all()[1])
a.b.add(B.objects.all()[2])
Now for each a object, I want to get all b objects:
import time
start = time.time()
objects = A.objects.all()
for n in objects.iterator():
list(n.b.all())
print(time.time() - start)
Output: 2.642864465713501
So it takes over 2 seconds for this query for just 1000 a objects. The performance is horrible. I have over 1000000 a objects in production.
I tried to increase the performance by using prefetch_related:
import time
start = time.time()
objects = A.objects.all().prefetch_related('b')
for n in objects.iterator():
list(n.b.all())
print(time.time() - start)
Output: 2.684298038482666
But that helps nothing. Why is it so slow and how can I improve the performance?
Upvotes: 3
Views: 3363
Reputation: 10957
When running this I'm around 0.7 sec with sqlite. Time reduces by 50% if I omit the list creation for every query.
Point is that you hit the DB as many times as you have A
objects.
So the best bet for performance increase is reducing the number of queries you make.
But here it actually matters what exactly you intend to do.
As this in unclear, from here on it's more or less just guessing what could work for you...
Maybe just iterate through A.objects.values_list('id')
and query B
instead as you don't really use the a
objects:
bq = B.objects.all()
for a_id, in A.objects.values_list('id').iterator():
list(bq.filter(b__id=a_id)) # maybe correct your related_name to 'a' so this would look a__id=a_id
print(time.time() - start)
or
start = time.time()
a_ids = A.objects.values_list('id').all()
bq = B.objects.all()
[list(bq.filter(b__id=a_id)) for a_id, in a_ids]
print(time.time() - start)
If you just want all b
objects related to some of your a
objects, say, for example, all with name_a
contains 'foo'
:
B.objects.filter(b__name_a__contains='foo').all().distinct()
Hope these suggestions might help
Upvotes: 2
Reputation: 32244
From the Django docs for prefetch_related
Note that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two optimizations do not make sense together.
Upvotes: 0