Improve performance of Django PostgreSQL query

Question

Suppose I have the following table:

class Word(Model):
    id = UUIDField(primary_key=True,
                   default=uuid.uuid4,
                   editable=False)
    name = CharField(_('Word'),
                     db_index=True,
                     max_length=250)

This table contains around 3 million rows. Content is mainly Cyrillian words.

What I want to accomplish is to search through some predefined list of values. The problem is that words in the table are accented.

Content of tables is like:

мо́рква
буря́к
я́блуко

But I query without accent.

words = ['морква', 'буряк', 'яблуко']

Word.objects.annotated(name_without_accent=Func(
            F('name'),Value('[%s]' % ACCENTS), Value(''),
            function='regexp_replace',
        )).filter(reduce(operator.or_, [Q(name_without_accent__icontains=w) for w in words], Q()))

The problem is that this query on such big table runs very long. An words list can contain more elements (around 20-30).

Is there anyway to improve overall performance of querying such big table? Improve Django query? Tune Postgres? Use some external tools?

vZ10 · Accepted Answer

Try to use unicodedata.normalize('NFD', word) instead of regexp. And if you do a lot of such queries it would be reasonable to make additional field with unAccent versions of words and make index by that field, because in your situation db_index doesn't give any help it just slows table with reindexing.

Improve performance of Django PostgreSQL query

Answers (1)

Related Questions