Santino
Santino

Reputation: 815

postgres cube euclidean distance query performance issues

I have a postgres database containing a table of documents with word embeddings of 100 dimensions and using it to find similar documents.

CREATE TABLE documents(
   id bigint,
   title text,
   body text,
   vector double[],
   PRIMARY KEY(id)
);

I have installed the cube extension and using it to sort documents by similarity from a selected document thus (as explained here):

SELECT id,title,body FROM documents ORDER BY cube(documents.vector) 
 <-> '(0.0990813672542572021,.. 0.0537704713642597198)'::cube  LIMIT 10;

I have the index setup here: CREATE INDEX ix_vect ON documents USING gist (cube(vector));

I am getting results as expected, but the query time is inordinately long ~30-45 second for a table size of ~2 Million rows. How can I improve performance to bring it down to acceptable levels i.e. <1 sec on millions of rows?

Upvotes: 3

Views: 1098

Answers (1)

Milad shiri
Milad shiri

Reputation: 990

The correct way to use CUBE by doc:

SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1;

Upvotes: 0

Related Questions