Reputation: 11
Description:
I am facing an issue where PostgreSQL becomes unresponsive after restarting. The setup utilizes the PGVector:pg16
Docker container. Below is the schema for the relevant table:
CREATE TABLE IF NOT EXISTS embeddings (
content_id UUID NOT NULL,
chunk_id UUID NOT NULL PRIMARY KEY,
chunk_position INT NOT NULL,
embedding_vector VECTOR(1536) NOT NULL,
inserted_at TIMESTAMP DEFAULT TIMEZONE('utc', NOW())
);
The following indexes are created:
CREATE INDEX IF NOT EXISTS embedding_vector_idx
ON embeddings USING hnsw (embedding_vector vector_ip_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX IF NOT EXISTS content_id_idx
ON embeddings (content_id);
There is an insertion of 400,000 rows into the embeddings
table in batches of 150 vectors. For example:
INSERT INTO embeddings (chunk_position, content_id, chunk_id, embedding_vector)
VALUES (cp1, cid1, chid1, ev1), …., (cp150, cid150, chid150, ev150);
The average insertion time per embedding is approximately 7.8 seconds. During this process, the RAM usage increases linearly, reaching 4GB (approximately 1GB per 100,000 vectors).
At a certain point, the system runs out of memory, necessitating a restart of the PGVector container to free up memory. However, post-restart, access to the embeddings
table is non-functional for about half an hour.
Guidance is sought on the following points:
embeddings
table?Tried:
embeddings
table in batches of 150 vectors.Expected:
Result:
embeddings
table was restored.Upvotes: 1
Views: 107