Reputation: 459
I'm using flask, sqlalchemy and flask-sqlalchemy. I want to create a full test search index in postgres with gin and to_tsvector. At the moment, I'm trying the following. I think its the closest I've got to what I'm trying to express, but doesn't work.
from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.schema import Index
from sqlalchemy.sql.expression import func
from app import db
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
added = db.Column(db.DateTime, nullable=False)
pub_date = db.Column(db.DateTime, nullable=True)
content = db.Column(db.Text)
@declared_attr
def __table_args__(cls):
return (Index('idx_content', func.to_tsvector("english", "content"), postgresql_using="gin"), )
This throws the following error...
Traceback (most recent call last):
File "./manage.py", line 5, in <module>
from app import app, db
File "/vagrant/app/__init__.py", line 36, in <module>
from pep.models import *
File "/vagrant/pep/models.py", line 8, in <module>
class Post(db.Model):
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/flask_sqlalchemy.py", line 477, in __init__
DeclarativeMeta.__init__(self, name, bases, d)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/api.py", line 48, in __init__
_as_declarative(cls, classname, cls.__dict__)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 222, in _as_declarative
**table_kw)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 326, in __new__
table._init(name, metadata, *args, **kw)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 393, in _init
self._init_items(*args)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 63, in _init_items
item._set_parent_with_dispatch(self)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/events.py", line 235, in _set_parent_with_dispatch
self._set_parent(parent)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 2321, in _set_parent
ColumnCollectionMixin._set_parent(self, table)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 1978, in _set_parent
self.columns.add(col)
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2391, in add
self[column.key] = column
File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2211, in __getattr__
key)
AttributeError: Neither 'Function' object nor 'Comparator' object has an attribute 'key'
I've also tried
return (Index('idx_content', "content", postgresql_using="gin"), )
However, it doesn't work as postgres (9.1 at least, as that's what I run) expects to_tsvector to be called. This line creates the SQL;
CREATE INDEX content_index ON post USING gin (content)
rather than what I want;
CREATE INDEX content_index ON post USING gin(to_tsvector('english', content))
I opened a ticket as I think this may be a bug/limitation. http://www.sqlalchemy.org/trac/ticket/2605
Upvotes: 11
Views: 9026
Reputation: 2655
Simplified, using ORM approach & TSVectorType
helper from sqlalchemy-utils
(https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):
TSVECTOR
column (TSVectorType
) in your ORM model (declarative) populated automatically from the source text field(s)import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html
class MyModel(Base):
__tablename__ = 'mymodel'
id = sa.Column(sa.Integer, primary_key=True)
content = sa.Column(sa.String, nullable=False)
content_tsv = sa.Column(
TSVectorType("content", regconfig="english"),
sa.Computed("to_tsvector('english', \"content\")", persisted=True))
# ^-- equivalent for SQL:
# COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;
__table_args__ = (
# Indexing the TSVector column
sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"),
)
ts_rank
to return results ordered by TS rank)
tsquery = sa.func.plainto_tsquery("english", "multiple words query here")
stmt = sa.select(MyModel).where(
MyModel.content_tsv.bool_op("@@")(tsquery)
).order_by(
sa.func.ts_rank(MyModel.content_tsv, tsquery).desc()
).limit(5)
t, = db.session.execute(stmt).first()
SQLAlchemy 1.4
vs SQLAlchemy 2.0
:If you try in SQLAlchemy 1.4
:
stmt = sa.select(MyModel).where(
MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")
)
(as shown here for 1.4
),
instead of:
tsquery = sa.func.plainto_tsquery("english", "multiple words query here")
# ^^^^^^^^^^^^^^^
stmt = sa.select(MyModel).where(
MyModel.content_tsv.bool_op("@@")(tsquery)
)
..it will translate to to_tsquery("english", "multiple words query here")
, which only accepts a single term, and thus raise SyntaxError
: ProgrammingError: (psycopg2.errors.SyntaxError) syntax error in tsquery: "multiple words query here"
.
This behavior is changed in SQLAlchemy 2.0
, as mentioned here: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#full-text-search.
SQLAlchemy 2.0
, TSVColumn.match(...)
uses plainto_tsquery(...)
:MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")
will translate to:
plainto_tsquery("english", "multiple words query here")
^^^^^^^^^^^^^^^
# instead of
# to_tsquery("english", "multiple words query here")
# which is NOT the same
Upvotes: 3
Reputation: 15130
Ran across this old question as I was working on creating some single and multicolumn tsvector GIN indexes. For anyone that is looking for a simple way to create these indexes using string representations of the column names, here is one approach using the SQLAlchemy text()
construct.
from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func
Base = declarative_base()
def to_tsvector_ix(*columns):
s = " || ' ' || ".join(columns)
return func.to_tsvector('english', text(s))
class Example(Base):
__tablename__ = 'examples'
id = Column(Integer, primary_key=True)
atext = Column(String)
btext = Column(String)
__table_args__ = (
Index(
'ix_examples_tsv',
to_tsvector_ix('atext', 'btext'),
postgresql_using='gin'
),
)
Upvotes: 3
Reputation: 2788
So in sqlalchemy 0.9 and up this works:
class Content(Base, ):
__tablename__ = 'content'
id = sa.Column(sa.Integer, primary_key=True)
description = sa.Column(sa.UnicodeText, nullable=False, server_default='')
@declared_attr
def __table_args__(cls):
return (sa.Index('idx_content',
sa.sql.func.to_tsvector("english", cls.description),
postgresql_using="gin"), )
Notably, the difference from the first example is a direct reference to the column name, as opposed the the column name being provided in quotes, as that did not work.
Upvotes: 1
Reputation: 459
For now I've added the following lines to do it manually, but I'd much rather the 'correct' SQLAlchemy approach if there is one.
create_index = DDL("CREATE INDEX idx_content ON pep USING gin(to_tsvector('english', content));")
event.listen(Pep.__table__, 'after_create', create_index.execute_if(dialect='postgresql'))
There was some interesting discussion on the SQLAlchemy bug tracker. It looks like this is a limitation of the current indexing definition. Basically, my requirement is to allow indexes to be expressions rather than just column names but that isn't currently supported. This ticket is tracking this feature request: http://www.sqlalchemy.org/trac/ticket/695 . However, this is waiting for a developer to take forward and do the work (and has been for a while).
Upvotes: 4