Matanel Abayof
Matanel Abayof

Reputation: 11

PostgreSQL unresponsive when restarting after many inserts using PGVector container

Description:

I am facing an issue where PostgreSQL becomes unresponsive after restarting. The setup utilizes the PGVector:pg16 Docker container. Below is the schema for the relevant table:

CREATE TABLE IF NOT EXISTS embeddings (
    content_id UUID NOT NULL,
    chunk_id UUID NOT NULL PRIMARY KEY,
    chunk_position INT NOT NULL,
    embedding_vector VECTOR(1536) NOT NULL,
    inserted_at TIMESTAMP DEFAULT TIMEZONE('utc', NOW())
);

The following indexes are created:

CREATE INDEX IF NOT EXISTS embedding_vector_idx 
ON embeddings USING hnsw (embedding_vector vector_ip_ops) 
WITH (m = 16, ef_construction = 64);

CREATE INDEX IF NOT EXISTS content_id_idx 
ON embeddings (content_id);

There is an insertion of 400,000 rows into the embeddings table in batches of 150 vectors. For example:

INSERT INTO embeddings (chunk_position, content_id, chunk_id, embedding_vector) 
VALUES (cp1, cid1, chid1, ev1), …., (cp150, cid150, chid150, ev150);

The average insertion time per embedding is approximately 7.8 seconds. During this process, the RAM usage increases linearly, reaching 4GB (approximately 1GB per 100,000 vectors).

At a certain point, the system runs out of memory, necessitating a restart of the PGVector container to free up memory. However, post-restart, access to the embeddings table is non-functional for about half an hour.

Guidance is sought on the following points:

  1. What could be causing the continuous increase in memory usage during bulk inserts into the embeddings table?
  2. Why does PostgreSQL become unresponsive for an extended period after restarting the PGVector container?

Tried:

Expected:

Result:

Upvotes: 1

Views: 107

Answers (0)

Related Questions