Reputation: 3140
I'm using Solr 9 for optimal query-document similarity calculations. I have a use-case where I have to query for specific field values first, and then compute document similarities on all of the documents that are found.
My problem is as follows: If each document has a field "embedding" and "id", I want to only retrieve documents with id=1,2,3, and given a query embedding, return the similarity score of each document with the query embedding.
Option 1: Query for the id's using fq
, and the q
field using knn
. Not all documents that I want will be returned because of the limitation below.
The main issue with this is documented here:
When using knn in re-ranking pay attention to the topK parameter.
The second pass score(deriving from knn) is calculated only if the document d from the first pass is within the k-nearest neighbors(in the whole index) of the target vector to search.
This means the second pass knn is executed on the whole index anyway, which is a current limitation.
Option 2: Query for the id's using fq
, get the embedding
in the field list, and compute the similarities in memory. The issue with that is the network latency, since the size of the response from Solr is large when retrieving the embeddings.
That leaves the following two questions:
Thanks!
Upvotes: 1
Views: 916
Reputation: 1
you can try vectorSimilarity Function ,which returns the similarity between two Knn vectors in an n-dimensional space. refer to solr 9.4 vectorSimilarity Function
Upvotes: 0