xnx
xnx

Reputation: 25550

Django with two __in queries slower than retrieving separate querysets

I have a reasonably complex database consisting of a model, Bar which may have several Spams, implemented by a ForeignKey relationship (each Spam knows which Bar it belongs to). A further model, Foo has exactly two Bars. Here's the relevant bit of models.py:

class Bar(models.Model):
    g = models.IntegerField()
    ...

class Spam(models.Model):
    name = models.CharField(max_lendth=20)
    val = models.CharField(max_length=10)
    bar = models.ForeignKey(bar)

class Foo(models.Model):
    bar1 = models.ForeignKey(Bar, related_name='foo_bar_2')
    bar2 = models.ForeignKey(Bar, related_name='foo_bar_1')

I'd like to know which Foo has a particular pair of bars, where each is identified by its set of Spams. Getting the bars matching the two sets is fine and reasonably fast:

def get_bars_queryset(spam_tuples):
    bars = Bar.objects
    for name, val in spam_tuples:
        bars = bars.filter(spam__name=name, spam__val=val)
    return bars

    bars1 =  get_bars_queryset(spam_tuples=[('eggs', '5'), ('bacon', 'yes')])
    bars2 =  get_bars_queryset(spam_tuples=[('eggs', '1'), ('toast': '4'), ('milk', '2')])

But when I try the obvious way of retrieving the Foo I want:

foos = Foo.objects.filter(bar1__in=bars1, bar2__in=bars2)

the performance is very bad. It turns out to be much quicker to retrieve the two sets of Foos separately and find their union:

foos1 = Foo.objects.filter(bar1__in=bars1)
foos2 = Foo.objects.filter(bar2__in=bars2)

foo_bar1_ids = set(foos1.values_list('pk', flat=True))
foo_bar2_ids = set(foos2.values_list('pk', flat=True))
foo_id = foo_bar1_ids & foo_bar2_ids
foos = Foo.objects.filter(pk__in=foo_id)

I'm guessing this is because the native Django approach is not joining my tables efficiently. How can I force it to?

Upvotes: 1

Views: 668

Answers (1)

xnx
xnx

Reputation: 25550

Well, in case it helps anyone else, the solution was to force evaluation of the bar querysets:

bars1 =  list(get_bars_queryset(spam_tuples=[('eggs', '5'), ('bacon', 'yes')]))
bars2 =  list(get_bars_queryset(spam_tuples=[('eggs', '1'), ('toast': '4'),
                                             ('milk', '2')]))

before filtering the Foos with

foos = Foo.objects.filter(bar1__in=bars1, bar2__in=bars2)

This is hinted at in the documentation (Performance considerations): https://docs.djangoproject.com/en/1.7/ref/models/querysets/#in as particularly a problem for MySQL. Thanks to @Minh-Hung Nguyen for pointing me in thr right direction.

Upvotes: 1

Related Questions