Daniil Kolesnichenko
Daniil Kolesnichenko

Reputation: 63

How to calculate amount of RAM required for serving X N-dimensional vectors with pgvector (HNSW)?

I'm trying to determine a function f(x, n) where x is the number of vectors and n is the number of vector dimensions such that for any x, n, f(x, n) is greater that the required amount of RAM to serve these vectors via a pgvector database with HNSW index.

I did some googling, but

  1. I can't find a formula for index size.
  2. I see that some databases (like Qdrant) can use memory-mapped files for both vectors and indices to reduce RAM consumption, but I don't see such options for pgvector. Does it require to store the whole HNSW index in memory? If not, how to determine the required amount of memory?

This article says:

The third trade-off for HNSW indexes is that they are sizable — the index for 1M rows of AI embeddings can be 8GB or larger. For performance reasons, you’ll want all of this index in memory.

although in this article Qdrant that uses the same HNSW indexing algorithm is said to only require ~1GB RAM for serving 1 million vectors (without memory mapping). Why such difference?

Upvotes: 3

Views: 1845

Answers (1)

lthomas123
lthomas123

Reputation: 79

According to the FAISS documentation the memory consumption, in bytes per vector, for an HNSW index is (d * 4 + M * 2 * 4) where d is the dimensionality of the indexed vectors and M is the number of edges per node in the constructed graph (a default value of 32 is typically used).

The Qdrant article you linked uses the glove100_angular dataset which has a dimensionality of 100 would need 100 * 4 + 32 * 2 * 4 = 656 bytes/vector to index using an HNSW index.

This would require 656_000_000 bytes of memory to index 1M vectors, or 0.656GB, which tracks with the Qdrant article's estimated 1GB of RAM.

Looking at the PGVector documentation, it looks like the whole index is not required to fit into RAM, however index build time will take significantly longer when the index no longer fits into memory (I assume this is due to Postgres building the index on disk).

Note: In the memory consumption formula provided by FAISS, (d * 4 + M * 2 * 4), the first part, (d * 4), appears to correspond to the number of bytes used for storing the vector itself, and M * 2 * 4 would correspond to the number of bytes necessary to store the references to its M neighbors. Since the HNSW index builds a graph with layers which are successively more sparsely populated, where the base layer uses M * 2 connections per node (by default), and every subsequent layer uses M connections per node, my understanding is that the above formula only accounts for the base layer.

Each node is randomly added to subsequent layers according to a probability, which is normalized by the value M_L. A typical default value for M_L is ln(M). With a default value of M=32, the number of nodes per layer, for a graph indexing 1M vectors, would look like:

L0: 1000000
L1: 1000000 / ln(32) ~= 288539
L2: 288539 / ln(32) ~= 83254
L3: 83254 / ln(32) ~= 24022
L4: 24022 / ln(32) ~= 6931
L5: 6931 / ln(32) ~= 1999
L6: 1999 / ln(32) ~= 577
L7: 577 / ln(32) ~= 166
L8: 166 / ln(32) ~= 48
L9: 48 / ln(32) ~= 13 --> this value is less than M=32, so it becomes the last layer of the graph (first layer to be searched)

The upper bound of the sum of all nodes in layers L_i > L0 is N, so the total number of edges for N vectors is bounded by:
M * 2 * N (edges for L0) + M * N (edges for layers L>0) = 3 * M * N.

This would put the memory consumption per vector at: (d * 4 + M * 3 * 4), so the dataset in the Qdrant article would require: ((100 * 4) + (32 * 3 * 4)) * 1_000_000 = 784000000 bytes (0.784GB). This is still very close to the memory consumption described the formula provided in the FAISS documentation, since the vast majority of the memory consumption comes from storing the vectors themselves, rather than the references to each vector's neighbors, but the difference can be meaningful in some cases, especially if the vectors have low dimensionality, in which case the references to each vector's neighbors consumes proportionally more memory.

Upvotes: 1

Related Questions