Shawn Wei Chew
Shawn Wei Chew

Reputation: 15

Performing vector search with sqlalchemy and pgvector

I am trying to implement hybrid search in postgresql with pgvector and sqlalchemy.

Below is the table schema:

class Project_images(Base):
    __tablename__ = "project_images"
    id = Column(Integer, Sequence("project_image_id_seq"), primary_key=True)
    image_link = Column(String(255))
    image_vector = Column(Vector(512))
    keywords = Column(String(255))
    keyword_vector = Column(Vector(768))

And this is the function I call to perform the search:

def query_db(
        image_encoding,
        image_search_weight,
        keyword_encoding,
        keyword_search_weight,
    ):
        search_query = text(
            """
                SELECT *, 
                ((:image_encoding <=> image_vector) * :image_search_weight + (:keyword_encoding <=> keyword_vector) * :keyword_search_weight) 
                AS vector_sum
                FROM project_images
                ORDER BY vector_sum
                LIMIT 50

            """
        )

        params = {
            "image_encoding": image_encoding,
            "image_search_weight": image_search_weight,
            "keyword_encoding": keyword_encoding,
            "keyword_search_weight": keyword_search_weight,
        }

        with session_class() as session:
            result = session.execute(search_query, params)
            return result

Trying to call query_db gives me the following error:

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: record <=> vector LINE 3: ...02891638, 0.08573333, -0.011385784, -0.020549707) <=> image_... ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

I have tried session.execute(text("CREATE EXTENSION IF NOT EXISTS vector")), but I still run into the same error.

Upvotes: 1

Views: 8187

Answers (1)

Ian Wilson
Ian Wilson

Reputation: 9089

I don't use this library but I was able to get your code to execute using the libraries it seems you are using. It wasn't clear to me if <=> is commutative or not. The python wrapping for the extension seems to support using the operation as a method and that seemed to solve the problem.

class ProjectImage(Base):
    __tablename__ = "project_images"
    id = Column(Integer, Sequence("project_image_id_seq"), primary_key=True)
    image_link = Column(String(255))
    image_vector = Column(Vector(512))
    keywords = Column(String(255))
    keyword_vector = Column(Vector(768))


metadata.create_all(engine)


def query_db(
        image_encoding,
        image_search_weight,
        keyword_encoding,
        keyword_search_weight,
    ):
        # You can put this in the select() but I define it
        # first so you can see it.
        computed_col = (
            (ProjectImage.image_vector.cosine_distance(image_encoding)*image_search_weight)
            + (ProjectImage.keyword_vector.cosine_distance(keyword_encoding)*keyword_search_weight)
        ).label("vector_sum")

        search_query = select(
            computed_col
        ).order_by(
            # Reference the computed column.
            text("vector_sum")
        ).limit(50)

        with Session(engine) as session:
            result = session.execute(search_query)
            return result
# This is just dummy data I put in which appears to match what would be passed in.
query_db([1]*512, 1, [1]*768, 1)

This is output of the query with echo=True set on the engine.

SELECT (project_images.image_vector <=> %(image_vector_1)s) * %(param_1)s + (project_images.keyword_vector <=> %(keyword_vector_1)s) * %(param_2)s AS vector_sum 
FROM project_images ORDER BY vector_sum 
 LIMIT %(param_3)s

If you want the ProjectImage and the vector_sum you should be able to do this:

    search_query = select(
        ProjectImage, computed_col
    )
   # Then later...
   for (project_image, vector_sum) in session.execute(search_query):
       print(project_image, vector_sum)

I used this pgvector-python-sqlalchemy to find the cosine distance method and other examples.

Upvotes: 3

Related Questions