How to use Wordnet Synonyms with Hibernate Search?

Question

I've been trying to figure out how to use WordNet synonyms with a search function I'm developing which uses Hibernate Search 5.6.1. At first, I thought about using Hibernate Search annotations:

@TokenFilterDef(factory = SynonymFilterFactory.class, params = {@Parameter(name = "ignoreCase", value = "true"),
  @Parameter(name = "expand", value = "true"),@Parameter(name = "synonyms", value = "synonymsfile") })

However, this requires an actual file populated with synonyms. From WordNet I was only able to get ".pl" files. So I tried manually making a SynonymAnalyzer class which would read from the ".pl" file:

public class SynonymAnalyzer extends Analyzer {

@Override
protected TokenStreamComponents createComponents(String fieldName) {
  final Tokenizer source = new StandardTokenizer();
  TokenStream result = new StandardFilter(source);
  result = new LowerCaseFilter(result);

  SynonymMap wordnetSynonyms = null;

  try {
    wordnetSynonyms = loadSynonyms();
  } catch (IOException e) {
    e.printStackTrace();
  }
  result = new SynonymFilter(result, wordnetSynonyms, false);
  result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
  return new TokenStreamComponents(source, result);
}

private SynonymMap loadSynonyms() throws IOException {
  File file = new File("synonyms\wn_s.pl");
  InputStream stream = new FileInputStream(file);
  Reader reader = new InputStreamReader(stream);
  SynonymMap.Builder parser = null;
  parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
  try {
    ((WordnetSynonymParser) parser).parse(reader);
  }   catch (ParseException e) {
    e.printStackTrace();
  }

  return parser.build();
}

}

The problem with this method is that I'm getting java.lang.OutOfMemoryError which I'm assuming is because there's too many synonyms or something? What is the proper way to do this, everywhere I've looked online has suggested using WordNet but I can't seem to find an example with Hibernate Search Annotations. Any help is appreciated, thanks!

yrodiere · Accepted Answer

The wordnet format is actually supported by SynonymFilterFactory. You're simply missing the "format" parameter in your annotation configuration; by default, the factory uses the Solr format.

Change your annotation to this:

@TokenFilterDef(
    factory = SynonymFilterFactory.class,
    params = {
        @Parameter(name = "ignoreCase", value = "true"),
        @Parameter(name = "expand", value = "true"),
        @Parameter(name = "synonyms", value = "synonymsfile"),
        @Parameter(name = "format", value = "wordnet") // Add this
    }
)

Also, make sure that the value of the "synonyms" parameter is the path of a file in your classpath (e.g. "com/acme/synonyms.pl", or just "synonyms.pl" if the file is at the root of your "resources" directory).

In general when you have an issue with the parameters of a Lucene filter/tokenizer factory, your best bet is having a look at the source code of that factory, or having a look at this page.

How to use Wordnet Synonyms with Hibernate Search?

Answers (1)

Related Questions