d0ugal
d0ugal

Reputation: 459

SQLAlchemy with PostgreSQL and Full Text Search

I'm using flask, sqlalchemy and flask-sqlalchemy. I want to create a full test search index in postgres with gin and to_tsvector. At the moment, I'm trying the following. I think its the closest I've got to what I'm trying to express, but doesn't work.

from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.schema import Index
from sqlalchemy.sql.expression import func

from app import db


class Post(db.Model):

    id = db.Column(db.Integer, primary_key=True)
    added = db.Column(db.DateTime, nullable=False)
    pub_date = db.Column(db.DateTime, nullable=True)
    content = db.Column(db.Text)

    @declared_attr
    def __table_args__(cls):
        return (Index('idx_content', func.to_tsvector("english", "content"), postgresql_using="gin"), )

This throws the following error...

Traceback (most recent call last):
  File "./manage.py", line 5, in <module>
    from app import app, db
  File "/vagrant/app/__init__.py", line 36, in <module>
    from pep.models import *
  File "/vagrant/pep/models.py", line 8, in <module>
    class Post(db.Model):
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/flask_sqlalchemy.py", line 477, in __init__
    DeclarativeMeta.__init__(self, name, bases, d)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/api.py", line 48, in __init__
    _as_declarative(cls, classname, cls.__dict__)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 222, in _as_declarative
    **table_kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 326, in __new__
    table._init(name, metadata, *args, **kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 393, in _init
    self._init_items(*args)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 63, in _init_items
    item._set_parent_with_dispatch(self)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/events.py", line 235, in _set_parent_with_dispatch
    self._set_parent(parent)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 2321, in _set_parent
    ColumnCollectionMixin._set_parent(self, table)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 1978, in _set_parent
    self.columns.add(col)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2391, in add
    self[column.key] = column
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2211, in __getattr__
    key)
AttributeError: Neither 'Function' object nor 'Comparator' object has an attribute 'key'

I've also tried

return (Index('idx_content', "content", postgresql_using="gin"), )

However, it doesn't work as postgres (9.1 at least, as that's what I run) expects to_tsvector to be called. This line creates the SQL;

CREATE INDEX content_index ON post USING gin (content)

rather than what I want;

CREATE INDEX content_index ON post USING gin(to_tsvector('english', content))

I opened a ticket as I think this may be a bug/limitation. http://www.sqlalchemy.org/trac/ticket/2605

Upvotes: 11

Views: 9026

Answers (4)

Jean Monet
Jean Monet

Reputation: 2655

Simplified, using ORM approach & TSVectorType helper from sqlalchemy-utils (https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):

Defining a TSVECTOR column (TSVectorType) in your ORM model (declarative) populated automatically from the source text field(s)

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)
    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', \"content\")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

How to query on the TSVector column? (and use ts_rank to return results ordered by TS rank)


tsquery = sa.func.plainto_tsquery("english", "multiple words query here")

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
).order_by(
    sa.func.ts_rank(MyModel.content_tsv, tsquery).desc()
).limit(5)

t, = db.session.execute(stmt).first()

Note important difference when querying with SQLAlchemy 1.4 vs SQLAlchemy 2.0:

If you try in SQLAlchemy 1.4:

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")
)

(as shown here for 1.4), instead of:

tsquery = sa.func.plainto_tsquery("english", "multiple words query here")
#                 ^^^^^^^^^^^^^^^
stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
)

..it will translate to to_tsquery("english", "multiple words query here"), which only accepts a single term, and thus raise SyntaxError: ProgrammingError: (psycopg2.errors.SyntaxError) syntax error in tsquery: "multiple words query here".

This behavior is changed in SQLAlchemy 2.0, as mentioned here: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#full-text-search.

Starting with SQLAlchemy 2.0, TSVColumn.match(...) uses plainto_tsquery(...):

MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")

will translate to:

plainto_tsquery("english", "multiple words query here")
^^^^^^^^^^^^^^^
# instead of
# to_tsquery("english", "multiple words query here")
# which is NOT the same

Upvotes: 3

benvc
benvc

Reputation: 15130

Ran across this old question as I was working on creating some single and multicolumn tsvector GIN indexes. For anyone that is looking for a simple way to create these indexes using string representations of the column names, here is one approach using the SQLAlchemy text() construct.

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )

Upvotes: 3

Damian
Damian

Reputation: 2788

So in sqlalchemy 0.9 and up this works:

class Content(Base, ):
    __tablename__ = 'content'

    id = sa.Column(sa.Integer, primary_key=True)

    description = sa.Column(sa.UnicodeText, nullable=False, server_default='')
    @declared_attr
    def __table_args__(cls):
        return (sa.Index('idx_content',
                     sa.sql.func.to_tsvector("english", cls.description),
                     postgresql_using="gin"), )

Notably, the difference from the first example is a direct reference to the column name, as opposed the the column name being provided in quotes, as that did not work.

Upvotes: 1

d0ugal
d0ugal

Reputation: 459

For now I've added the following lines to do it manually, but I'd much rather the 'correct' SQLAlchemy approach if there is one.

create_index = DDL("CREATE INDEX idx_content ON pep USING gin(to_tsvector('english', content));")
event.listen(Pep.__table__, 'after_create', create_index.execute_if(dialect='postgresql'))

There was some interesting discussion on the SQLAlchemy bug tracker. It looks like this is a limitation of the current indexing definition. Basically, my requirement is to allow indexes to be expressions rather than just column names but that isn't currently supported. This ticket is tracking this feature request: http://www.sqlalchemy.org/trac/ticket/695 . However, this is waiting for a developer to take forward and do the work (and has been for a while).

Upvotes: 4

Related Questions