Jackson
Jackson

Reputation: 1526

How to search chinese characters with Solr?

Basically I'm working on Drupal & using Solr as search engine. It searches some of the simplified chinese word/characters & some not like below

美国:为美朝峰会同朝鲜进行的磋商取得进展

It's not searching as simple character.

So I gone through both

https://lucene.apache.org/solr/guide/7_4/language-analysis.html http://www.opencms-wiki.org/wiki/Solr_-_configuration_for_Chinese_and_correct_results_for_german_umlauts

& in solr config file I have below

<fieldType name="text_chinese" class="solr.TextField">
  <analyzer class="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"/>
  <analyzer>
      <tokenizer class="solr.HMMChineseTokenizerFactory"/>
      <filter class="solr.CJKWidthFilterFactory"/>
      <filter class="solr.StopFilterFactory"
              words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
</fieldType>

It's giving

local: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core local: Plugin init failure for [schema.xml] fieldType "text_chinese": Cannot load analyzer: org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer. Schema file is /var/solr/cores/local/conf/schema.xml

still it's not giving result.

Not sure if missing something in config.

Upvotes: 3

Views: 1302

Answers (1)

MatsLindh
MatsLindh

Reputation: 52802

The error message is telling you that Solr isn't able to find the implementing class of the analyzer you have defined - Cannot load analyzer: org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer.

The SmartCN analyzer isn't loaded by default, but it's included in the binary build under contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-<version number>.jar.

Add the directory to the list of directories that Solr can load libraries from in solrconfig.xml:

<lib dir="../../../contrib/analysis-extras/lucene-libs" regex=".*smartcn.*\.jar" />

Upvotes: 5

Related Questions