Anurag.D
Anurag.D

Reputation: 21

Solr Language Detection

I have a field "text", which I need to copy to text_en or text_es based on the language of "text". Below is my managed_schema.xml:

<updateRequestProcessorChain name="langid">
<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
       <bool name="langid">true</bool>
       <str name="langid.fl">text</str>
       <str name="langid.langField">tweet_lang</str>
       <str name="langid.whitelist">es,en</str>
       <bool name="langid.map">true</bool>
       <!--bool name="langid.map.individual">true</bool-->
        <str name="langid.map.individual.fl">text</str>
    <bool name="langid.map.keepOrig">true</bool>
       <str name="langid.fallback">ko</str>
     </processor>
     <processor class="solr.LogUpdateProcessorFactory" />
     <processor class="solr.RunUpdateProcessorFactory" />
   </updateRequestProcessorChain>

I created a copy field text_en and text_es.When I post the data in spanish, data is copied from text to text_en and text_es as well!

How do I solve this?

Thanks!

Upvotes: 1

Views: 1140

Answers (2)

Anurag.D
Anurag.D

Reputation: 21

Thanks for the headsup! The issue is solved by removing the copy fields and created dynamic fields

  • *_es and
  • *_en in schema.xml

Upvotes: 0

EricLavault
EricLavault

Reputation: 16095

By creating copyFields from text to text_en and text_es you get incoming data into both fields regardless of the langage detection, that is what copyField is supposed to do.

The updateRequestProcessor will actually make a copy (rather than a move) because you set <bool name="langid.map.keepOrig">true</bool>.

Other than that, the processor's config looks fine, just remove these copyFields and ensure the mapped fields text_en and text_es are well defined in your schema.

Upvotes: 1

Related Questions