Reputation: 25550
I have a reasonably complex database consisting of a model, Bar
which may have several Spam
s, implemented by a ForeignKey relationship (each Spam
knows which Bar
it belongs to). A further model, Foo
has exactly two Bars
. Here's the relevant bit of models.py
:
class Bar(models.Model):
g = models.IntegerField()
...
class Spam(models.Model):
name = models.CharField(max_lendth=20)
val = models.CharField(max_length=10)
bar = models.ForeignKey(bar)
class Foo(models.Model):
bar1 = models.ForeignKey(Bar, related_name='foo_bar_2')
bar2 = models.ForeignKey(Bar, related_name='foo_bar_1')
I'd like to know which Foo
has a particular pair of bar
s, where each is identified by its set of Spams
. Getting the bars
matching the two sets is fine and reasonably fast:
def get_bars_queryset(spam_tuples):
bars = Bar.objects
for name, val in spam_tuples:
bars = bars.filter(spam__name=name, spam__val=val)
return bars
bars1 = get_bars_queryset(spam_tuples=[('eggs', '5'), ('bacon', 'yes')])
bars2 = get_bars_queryset(spam_tuples=[('eggs', '1'), ('toast': '4'), ('milk', '2')])
But when I try the obvious way of retrieving the Foo
I want:
foos = Foo.objects.filter(bar1__in=bars1, bar2__in=bars2)
the performance is very bad. It turns out to be much quicker to retrieve the two sets of Foo
s separately and find their union:
foos1 = Foo.objects.filter(bar1__in=bars1)
foos2 = Foo.objects.filter(bar2__in=bars2)
foo_bar1_ids = set(foos1.values_list('pk', flat=True))
foo_bar2_ids = set(foos2.values_list('pk', flat=True))
foo_id = foo_bar1_ids & foo_bar2_ids
foos = Foo.objects.filter(pk__in=foo_id)
I'm guessing this is because the native Django approach is not joining my tables efficiently. How can I force it to?
Upvotes: 1
Views: 668
Reputation: 25550
Well, in case it helps anyone else, the solution was to force evaluation of the bar
querysets:
bars1 = list(get_bars_queryset(spam_tuples=[('eggs', '5'), ('bacon', 'yes')]))
bars2 = list(get_bars_queryset(spam_tuples=[('eggs', '1'), ('toast': '4'),
('milk', '2')]))
before filtering the Foos
with
foos = Foo.objects.filter(bar1__in=bars1, bar2__in=bars2)
This is hinted at in the documentation (Performance considerations): https://docs.djangoproject.com/en/1.7/ref/models/querysets/#in as particularly a problem for MySQL. Thanks to @Minh-Hung Nguyen for pointing me in thr right direction.
Upvotes: 1