Reputation: 23
I passed a dense vector to Solr9 for indexing but it takes the values passed and put them into a variable whose data type is pdoubles. I have made changes to the managed-schema.xml to identify the field named vector
as a knn_vector
, but solr dynamically created a new field named vectors
of type pdouble
.
Lines that i added to managed-schema.xml
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="768" similarityFunction="euclidean"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>
Dynamically added lines by solr itself
<field name="vectors" type="pdoubles"/>
For reference my code
embedder = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
corpus = [documents[d]['paragraph'] for d in documents]
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=False)
d=0
for row in corpus_embeddings:
documents[str(d)]['vectors']=np.array(row).tolist()
d=d+1
import pysolr
solr = pysolr.Solr('http://localhost:8983/solr/VectorPilotRun/', always_commit=True, timeout=10)
results=solr.search("{!knn f=vector topK=10}"+str(documents['500']['vectors']))
print("Saw {0} result(s).".format(len(results)))
for result in results:
print("The details are : '{0} {1} {2}'\n.".format(result['id'],result['paragraph'],result['paragraph_num']))
The result of this search is null
.
When I try to query the knn_vector
field that is vector
field it shows no results. I believe this is because all the data is associated to the vectors(pdouble)
field instead of vector(knn_vector)
.
How do I add data so that it is stored in the correct field and type and not dynamically changed to another type? I have used pysolr to add data and the vector are list of float values.
Upvotes: 1
Views: 882
Reputation: 11
Beside the managed-schema.xml you should add a separate file schema.xml in the config dir before creating solr collection
For example,
<schema name="visual-search1" version="1.0">
<fieldType name="string" class="solr.StrField" omitNorms="true" positionIncrementGap="0"/>
<!-- vector-based field -->
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="512" similarityFunction="cosine" omitNorms="true"/>
<fieldType name="long" class="org.apache.solr.schema.LongPointField" docValues="true" omitNorms="true" positionIncrementGap="0"/>
<!-- basic text field -->
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
<field name="image_path" type="text" indexed="true" stored="true"/>
<field name="gender" type="text" indexed="true" stored="true"/>
<field name="article_type" type="text" indexed="true" stored="true"/>
<field name="color" type="text" indexed="true" stored="true"/>
<field name="sub_category" type="text" indexed="true" stored="true"/>
<field name="feature" type="knn_vector" indexed="true" stored="true" multiValued="false"/>
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
<uniqueKey>id</uniqueKey>
</schema>
Upvotes: 1