hariii
hariii

Reputation: 89

Cannot tokenize words when using lucene SynonymFilter

public class SynonymAnalyzer extends Analyzer {


    @Override
    protected TokenStreamComponents createComponents(String s, Reader reader) {
        SynonymMap synonymMap = null;
        SynonymMap.Builder builder=null;
        try {
            addTo(builder,new String[]{"dns"},new String[]{"domain name system"});
            synonymMap = builder.build();
        }catch (Exception e) {
            e.printStackTrace();
        }
        Tokenizer tokenizer = new StandardTokenizer(reader);
        TokenStream filter = new SynonymFilter(tokenizer, synonymMap, true);
        return new TokenStreamComponents(tokenizer, filter);
    }

     private void addTo(SynonymMap.Builder builder, String[] from, String[] to) {
         for (String input : from) {
             for (String output : to) {
                 builder.add(new CharsRef(input), new CharsRef(output), false);
             }
         }
     }
 }

If I use this SynonymAnalyzer,and search for dns is down, query formed is +n:domain name system +n:is +n:down. domain name system is not tokenized as seperate tokens but I need that as seperate tokens.

Upvotes: 0

Views: 97

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

When adding multi-word synonyms, you need to separate words with SynonymMap.WORD_SEPARATOR:

addTo(builder,new String[]{"dns"},new String[]{
    "domain" + SynonymMap.WORD_SEPARATOR
    + "name" + SynonymMap.WORD_SEPARATOR
    + "system"});

(By the way, your createComponents, as written, will throw an NPE. Judging by what you've written, I'll assume this is an error in the example, not your code in production)

Upvotes: 1

Related Questions