Efficiency of Django filter that mixes indexed and nonindexed fields?

Question

Suppose I have a model like:

class Widget(models.Model):
    a = models.CharField(db_index=True, unique=True)
    b = models.CharField()

and then I perform a filter like:

Widgets.objects.filter(a=a, b=b).get()

would you expect the performance of this to be (1) the same; or (2) worse ? - than if the model was instead defined as:

class Widget(models.Model):
    a = models.CharField(db_index=True, unique=True)
    b = models.CharField(db_index=True)  # <--- now b is indexed

I mean, logically, because a is indexed and unique, at most one result could be found for some value of a, so the engine could just do the lookup on a and then check that the results b field is the expected value, without using any index on b.

But in practice, will the generated SQL and underlying SQL engine be smart enough at query planning to figure that out? Is such query planning trivial?

willeM_ Van Onsem · Accepted Answer

It is not said that an index is per se used. A database will collect statistics on the specificity of the indexes, and sometimes a full scan is more efficient. For example if the database expects to return a huge portion of the elements. In that case it will perform disk I/O to return the records, so using the index first can be slower.

would you expect the performance of this to be (1) the same; or (2) worse ?

It will like be approximately the same, or slightly worse. In case you index the two fields. The database will normally look for the index of the field that can accelerate the most. It is possible that b is better for this, so then it will use the index of b, and "scan" for the field of a.

You can however index both together. You do this with the indexes option [Django-doc] of the model Meta options [Django-doc]:

class Widget(models.Model):
    a = models.CharField(db_index=True, unique=True)
    b = models.CharField(db_index=True)

    class Meta:
        indexes = [
            models.Index(fields=['a', 'b'])  # ← index together
        ]

That being said, if a (or b) can already reduce the search space significantly, the speedup will not be that much. An index is used to allow retrieving only these parts of the disk that the database will be necessary. If however the second index reduces the number of elements, but it still needs more or less the same segments, the speedup will not be that much, since I/O is (very) often the bottleneck.

Efficiency of Django filter that mixes indexed and nonindexed fields?

Answers (1)

Related Questions