Reputation: 84
I am using peewee and PGVector to manage a PostreSQL database with Documents. The documents are chunked into a DocumentChunk
class, let's say:
class DocumentChunk(BaseModel):
id = TextField(primary_key=True)
text = TextField()
embedding = VectorField(dimensions=BGE_LARGE_DIMENSION)
class Meta:
table_name = "DocumentChunk"
I have a big list of DocumentChunk
that I would like to insert into the database. The thing is that since I have the HNSW index the inserts have been extremely slow. I've read that a way to improve this is using COPY
, in particular with FORMAT BINARY
as it says there. However, I haven't been able to find a straight forward way to go from my list<DocumentChunk>
to the format that the COPY
is expecting
Upvotes: 2
Views: 400