PepperoniPizza
PepperoniPizza

Reputation: 9112

Django Fulltext search on JSON field

I am trying to implement the newly added Fulltext-search Postgres support for Django 1.10.

One of the fields I'm trying to search on is a JSON field:

from django.contrib.postgres.fields import JSONField
class Product():
        attributes = JSONField(...)

If I try to do the following search using a SearchVector

Product.objects.annotate(
    search=SearchVector('attributes'),
).filter(search=keyword)

Will raise:

django.db.utils.DataError: invalid input syntax for type json
LINE 1: ...ol1, to_tsvector(COALESCE("product"."attributes", '')) AS "s...
                                                         ^
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1:

Which makes sense, a raw SQL query like this will need to be

select to_tsvector(attributes::text) from product;

But how can I achieve that field conversion inside Django syntax ?

Upvotes: 3

Views: 3898

Answers (2)

e4c5
e4c5

Reputation: 53774

Why you shouldn't do this

The very reason that JSONB data type and JSON/JSONB functions were introduced was to avoid this kind of search! The Django ORM provides access to most of this functionality. The Func expression can be used to for functionality that cannot quite be reached with the double underscore notation etc.

If on the other hand you have a large text field in your JSONB column that really does need to be full text searched. This indicates that the database design isn't optimal. That field should be pulled out of the JSON and it should be a field on it's own right.

If you still want to do this

Django 1.10 just added a Cast function.

class Cast(expression, output_field)
Forces the result type of expression to be the one from output_field.

If you are on an older version of Django you can use the RawSQL function. You will find an example on the same page linked above. Please note that the use of the RawSQL function is not the same as executing a raw query.

Update Jan 2018

There is a certain belief that using RawSQL will make your code non portable and you should never use it. This question deals with the django.contrib.postgres package, your code most definitely isn't portable. So if you want to avoid raw sql, it should be for a reason other than portability.

Upvotes: 1

Paolo Melchiorre
Paolo Melchiorre

Reputation: 6122

As @e4c5 reported there's Cast function since Django 1.10 (the same version you used).

So if you would to search a JSON field as text you have to cast it as text:

from django.contrib.postgres.search import SearchVector
from django.db.models import TextField
from django.db.models.functions import Cast

Product.objects.annotate(
    search=SearchVector(Cast('attributes', TextField())),
).filter(search=keyword)

You can also use only specific part of your JSON field in your SearchVector:

from django.contrib.postgres.search import SearchVector
from django.contrib.postgres.fields.jsonb import KeyTextTransform

Product.objects.annotate(
    search=SearchVector(KeyTextTransform('key1', 'attributes')),
).filter(search=keyword)

PostgreSQL 10 added Full Text Search support for JSON and JSONB.

Upvotes: 8

Related Questions