Babu
Babu

Reputation: 5250

Solr automaticly generated id doesn't work

I want to have automaticly generated ids for my solr documents, I do it exactly as in Solr Cook Book, but it doesn't work. I get this exception (running default on Jetty).

ERROR org.apache.solr.core.CoreContainer  – Unable to create core: collection1
org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField.
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
    at org.apache.solr.core.CoreContainer.createFromLocal(CoreConta

Did I miss something?

My schema.xml:

    <?xml version="1.0" encoding="UTF-8" ?>
<schema name="transcripts" version="1.5"> 

<fields>   
   <field name="id" type="uuid" indexed="true" stored="true" default="NEW" required="true"/>
   <field name="stime" type="long" indexed="true" stored="true" required="true" multiValued="false"/>
   <field name="etime" type="long" indexed="true" stored="true" required="true" multiValued="false"/>
   <field name="speakerid" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
   <field name="speakergender" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
   <field name="videoid" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
   <field name="transcriptLIUM" type="text_en_splitting" indexed="true" stored="true" multiValued="false" required="false"/>
   <field name="transcriptLIMSI" type="text_en_splitting" indexed="true" stored="true" multiValued="false" required="true"/>

  <field name="_version_" type="long" indexed="true" stored="true"/>
 </fields>

 <types>
  <fieldType name="uuid" class="solr.UUIDField" indexed="true" /> 
  <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> 

    <!-- A text field with defaults appropriate for English, plus
     aggressive word-splitting and autophrase features enabled.
     This field is just like text_en, except it adds
     WordDelimiterFilter to enable splitting and matching of
     words on case-change, alpha numeric boundaries, and
     non-alphanumeric chars.  This means certain compound word
     cases will work, for example query "wi fi" will match
     document "WiFi" or "wi-fi".
        -->

<fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- TODO zde nahradi nas THD tokenizer - use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->

        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
         but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
    <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>  
 </types>   

</schema>

Upvotes: 0

Views: 1344

Answers (3)

Jan Rasehorn
Jan Rasehorn

Reputation: 311

The Solr docs tell, that default "NEW" shall not be used when the UUID-type field is meant be used as unique key also. Additionally it tells, to not make use of default "NEW" in Solr Cloud environments, since this would lead to different UUIDs being generated in every replica.

Instead make use of the 'UUIDUpdateProcessorFactory' to generate the ID with an update processor chain.

The following stackoverflow thread contains a hint, how the processor chain should be configured: Configuring Solr to use UUID as a key

If you do not intend to define a custom request handler, you may pass the query parameter to the query URL, e.g. http://<host>:<port>/solr/<core>/update?commit=true&update.chain=<your chain name>

Upvotes: 0

Jayendra
Jayendra

Reputation: 52799

Query elevation needs you to define a unique key element in the schema.xml.

<uniqueKey>fileid</uniqueKey>

Also, the unique key should be unique as in your case the default is NEW and may not be unique.

Also note

  1. Solr does not need a unique key and would work fine as well
  2. If you don't need query elevation component and you just get rid of it.

Upvotes: 0

Okke Klein
Okke Klein

Reputation: 2549

If you want to keep query elevation read UniqueKey Wiki. Especially the "UUID techniques" segment.

Upvotes: 1

Related Questions