atulbhushan
atulbhushan

Reputation: 23

Solr 9: Dense Vector input being converted to pdouble format

I passed a dense vector to Solr9 for indexing but it takes the values passed and put them into a variable whose data type is pdoubles. I have made changes to the managed-schema.xml to identify the field named vector as a knn_vector, but solr dynamically created a new field named vectors of type pdouble.

Lines that i added to managed-schema.xml

<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="768" similarityFunction="euclidean"/>

<field name="vector" type="knn_vector" indexed="true" stored="true"/>

Dynamically added lines by solr itself

<field name="vectors" type="pdoubles"/>

For reference my code

embedder = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
corpus = [documents[d]['paragraph'] for d in documents]
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=False)

d=0
for row in corpus_embeddings:
   documents[str(d)]['vectors']=np.array(row).tolist()
   d=d+1 
import pysolr
solr = pysolr.Solr('http://localhost:8983/solr/VectorPilotRun/', always_commit=True, timeout=10)

results=solr.search("{!knn f=vector topK=10}"+str(documents['500']['vectors']))
print("Saw {0} result(s).".format(len(results)))
for result in results:
    print("The details are : '{0} {1} {2}'\n.".format(result['id'],result['paragraph'],result['paragraph_num']))

The result of this search is null.

When I try to query the knn_vector field that is vector field it shows no results. I believe this is because all the data is associated to the vectors(pdouble) field instead of vector(knn_vector).

How do I add data so that it is stored in the correct field and type and not dynamically changed to another type? I have used pysolr to add data and the vector are list of float values.

Upvotes: 1

Views: 882

Answers (1)

Nha Duong
Nha Duong

Reputation: 11

Beside the managed-schema.xml you should add a separate file schema.xml in the config dir before creating solr collection

For example,

<schema name="visual-search1" version="1.0">
  <fieldType name="string" class="solr.StrField" omitNorms="true" positionIncrementGap="0"/>
  <!-- vector-based field -->
  <fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="512" similarityFunction="cosine" omitNorms="true"/>
  <fieldType name="long" class="org.apache.solr.schema.LongPointField" docValues="true" omitNorms="true" positionIncrementGap="0"/>
  <!-- basic text field -->
  <fieldType name="text" class="solr.TextField">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

  <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
  <field name="image_path" type="text" indexed="true" stored="true"/>
  <field name="gender" type="text" indexed="true" stored="true"/>
  <field name="article_type" type="text" indexed="true" stored="true"/>
  <field name="color" type="text" indexed="true" stored="true"/>
  <field name="sub_category" type="text" indexed="true" stored="true"/>
  <field name="feature" type="knn_vector" indexed="true" stored="true" multiValued="false"/>
  <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
  <uniqueKey>id</uniqueKey>

</schema>

Upvotes: 1

Related Questions