Tryph
Tryph

Reputation: 6209

How to use a tsvector field to perform ranking in Django with postgresql full-text search?

I need to perform a ranking query using postgresql full-text search feature and Django with django.contrib.postgres module.

According to the doc, it is quite easy to do this using the SearchRank class by doing the following:

>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('body_text')
>>> query = SearchQuery('cheese')
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')

This probably works well but this is not exactly what I want since I have a field in my table which already contains tsvectorized data that I would like to use (instead of recomputing tsvector at each search query).

Unforunately, I can't figure out how to provide this tsvector field to the SearchRank class instead of a SearchVector object on a raw data field.

Is anyone able to indicate how to deal with this?

Edit: Of course, simply trying to instantiate a SearchVector from the tsvector field does not work and fails with this error (approximately since I translated it from french):

django.db.utils.ProgrammingError: ERROR: function to_tsvector(tsvector) does not exist

Upvotes: 12

Views: 3455

Answers (2)

lys
lys

Reputation: 1037

I've been seeing mixed answers here on SO and in the official documentation. F Expressions aren't used in the documentation for this. However it may just be that the documentation doesn't actually provide an example for using SearchRank with a SearchVectorField.

Looking at the output of .explain(analyze=True) :

Without the F Expression:

Sort Key: (ts_rank(to_tsvector(COALESCE((search_vector)::text, ''::text)) 

When the F Expression is used:

Sort Key: (ts_rank(search_vector, ...) 

In my experience, it seems the only difference between using an F Expression and the field name in quotes is that using the F Expression returns much faster, but is sometimes less accurate - depending on how you structure the query - it can be useful to enforce it with a COALESCE in some cases. In my case it's about a 3-5x speedboost to use the F Expression with my SearchVectorField.

Ensuring your SearchQuery has a config kwarg also improves things dramatically.

Upvotes: 0

Nad
Nad

Reputation: 556

If your model has a SearchVectorField like so:

from django.contrib.postgres.search import SearchVectorField

class Entry(models.Model):
    ...
    search_vector = SearchVectorField()

you would use the F expression:

from django.db.models import F

...
Entry.objects.annotate(
    rank=SearchRank(F('search_vector'), query)
).order_by('-rank')

Upvotes: 19

Related Questions