Federico
Federico

Reputation: 84

Improve performance in insert with PGVector with HNSW Index

I am using peewee and PGVector to manage a PostreSQL database with Documents. The documents are chunked into a DocumentChunk class, let's say:

class DocumentChunk(BaseModel):
    id = TextField(primary_key=True)
    text = TextField()
    embedding = VectorField(dimensions=BGE_LARGE_DIMENSION)
    class Meta:
        table_name = "DocumentChunk"

I have a big list of DocumentChunk that I would like to insert into the database. The thing is that since I have the HNSW index the inserts have been extremely slow. I've read that a way to improve this is using COPY, in particular with FORMAT BINARY as it says there. However, I haven't been able to find a straight forward way to go from my list<DocumentChunk> to the format that the COPY is expecting

Upvotes: 2

Views: 400

Answers (0)

Related Questions