vladimir
vladimir

Reputation: 705

Solr non-english indexing and search

I am new in SOLR. I have a problem. I put data into SOLR via xml, data in German, for example:

<?xml version="1.0" encoding="utf-8" ?>
<add>
<doc>
  <field name="id">1</field>
  <field name="name">Größen helfen, ihr Potenzial voll zu entfalten. Sicherheit und Zuverlässigkeit, Innovation und Integration sowie</field>
</doc>
</add>

This document saved successfully, when I search from admin panel with query "name:*" it returns, but when I try to search with this query "name:*uverlässigkeit*" it does not returns. I think this is problem with German language, but I don`t know how to fix this problem. Could anybody to help me understand what is wrong.

Upvotes: 1

Views: 1013

Answers (3)

Jayendra
Jayendra

Reputation: 52769

What request handler are you using ?
Standard request handlers do not support leading wildcard queries.

so name:uverlässigkeit would not work.

If you want to use leading wildcard queries, you need to check Extended Dismax parser which allows leading wildcards. However, there is always performance impact with wildcards.

The match for foreign characters work fine with solr.However you need to be consistent with the analysis at index and query time, if using ASCII Folding or ISO Latin Filter.

Also, as fiskfisk mentioned you need to add encoding in Tomcat, if using it the web container.

Upvotes: 0

hupf
hupf

Reputation: 604

Alternatively, it might also be a good idea to use the following filter in your query/index analyzer:

<filter class="solr.ASCIIFoldingFilterFactory"/>

This replaces german umlauts with their standard ASCII relatives and improves the matching.

Upvotes: 0

MatsLindh
MatsLindh

Reputation: 52792

You can't perform searches starting with a wildcard - only postfix wildcards are allowed in a search query (as it would otherwise have to scan all the terms associated with a document). If you need to perform a search on a prefix, index the term reversed (but be aware that this might throw other functionality off if you use your fields without giving much thought about what you're searching) and the search against that field with a postfix wildcard.

Also be aware that your application container (i.e. Jetty, Tomcat, etc.) will have to be UTF-8 aware (for Tomcat you'll have to configure this specifically) for a search against UTF-8 strings to work properly.

Upvotes: 1

Related Questions