Reputation: 815
I have a postgres database containing a table of documents with word embeddings of 100 dimensions and using it to find similar documents.
CREATE TABLE documents(
id bigint,
title text,
body text,
vector double[],
PRIMARY KEY(id)
);
I have installed the cube
extension and using it to sort documents by similarity from a selected document thus (as explained here):
SELECT id,title,body FROM documents ORDER BY cube(documents.vector)
<-> '(0.0990813672542572021,.. 0.0537704713642597198)'::cube LIMIT 10;
I have the index setup here:
CREATE INDEX ix_vect ON documents USING gist (cube(vector));
I am getting results as expected, but the query time is inordinately long ~30-45 second for a table size of ~2 Million rows. How can I improve performance to bring it down to acceptable levels i.e. <1 sec on millions of rows?
Upvotes: 3
Views: 1098
Reputation: 990
The correct way to use CUBE by doc:
SELECT c FROM test ORDER BY c <-> cube(array[0.5,0.5,0.5]) LIMIT 1;
Upvotes: 0