Reputation: 2797
Suppose I have the following table:
class Word(Model):
id = UUIDField(primary_key=True,
default=uuid.uuid4,
editable=False)
name = CharField(_('Word'),
db_index=True,
max_length=250)
This table contains around 3 million rows. Content is mainly Cyrillian words.
What I want to accomplish is to search through some predefined list of values. The problem is that words in the table are accented.
Content of tables is like:
мо́рква
буря́к
я́блуко
But I query without accent.
words = ['морква', 'буряк', 'яблуко']
Word.objects.annotated(name_without_accent=Func(
F('name'),Value('[%s]' % ACCENTS), Value(''),
function='regexp_replace',
)).filter(reduce(operator.or_, [Q(name_without_accent__icontains=w) for w in words], Q()))
The problem is that this query on such big table runs very long. An words
list can contain more elements (around 20-30).
Is there anyway to improve overall performance of querying such big table? Improve Django query? Tune Postgres? Use some external tools?
Upvotes: 0
Views: 176
Reputation: 2686
Try to use unicodedata.normalize('NFD', word)
instead of regexp. And if you do a lot of such queries it would be reasonable to make additional field with unAccent versions of words and make index by that field, because in your situation db_index doesn't give any help it just slows table with reindexing.
Upvotes: 2