Reputation: 2721

How to implement a basic Analyzer in Lucene 4.2.1?

Lucene 4.2.1 doesnot have StandardAnalyzer, and I am not sure how to implement a basic analyzer that does not alter the source text. Any pointers?

final SimpleFSDirectory DIRECTORY = new SimpleFSDirectory(new File(ELEMENTS_INDEX_DIR));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_42, new Analyzer() {
        @Override
        protected TokenStreamComponents createComponents(String s, Reader reader) {
            return null;
        }
    });
    IndexWriter indexWriter = new IndexWriter(DIRECTORY, indexWriterConfig);
    List<ModelObject> elements = dao.getAll();
    for (ModelObject element : elements) {
        Document document = new Document();
        document.add(new StringField("id", String.valueOf(element.getId()), Field.Store.YES));
        document.add(new TextField("name", element.getName(), Field.Store.YES));
        indexWriter.addDocument(document);
    }
    indexWriter.close();

Upvotes: 3

Answers (2)

Dorian

Reputation: 1078

You should add the Common Analyzers to your project. They are now available in a separate JAR file in the Lucene-4.2.1.zip file under "analysis/common".

 lucene-analyzers-common-4.*.jar

Once you add it to your project (as you added the core) you should have this working:

import org.apache.lucene.analysis.standard.StandardAnalyzer;

Upvotes: 2

femtoRgon

Reputation: 33351

You have to return a TokenStreamComponents from createComponents. null is not adequate.

However, Lucene 4.2.1 certainly does have StandardAnalyzer.

If you are, perhaps, refering to the changes in StandardAnalyzer in Lucene 4.x, and are looking for the old StandardAnalyzer, then you'll want ClassicAnalyzer.

If you really want a trimmed down Analyzer that doesn't modify anything, but just tokenizes in a very simple fashion, perhaps WhitespaceAnalyzer will serve your purposes.

If ou don't want it modified or tokenized at all, then KeywordAnalyzer.

And if you must create your very own Analyzer, as you say, then override the method createComponents, and actually build and return an instance of TokenStreamComponents. If none of the above four serve your needs, I have no idea what your needs are, and so I won't make an attempt a specific example here, but here is the example from the Analyzer docs

Analyzer analyzer = new Analyzer() {
 @Override
  protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new FooTokenizer(reader);
    TokenStream filter = new FooFilter(source);
    filter = new BarFilter(filter);
    return new TokenStreamComponents(source, filter);
  }
};

There is a single arg ctor for TokenStreamComponents as well, so the filter is optional, by the way.

Upvotes: 9

How to implement a basic Analyzer in Lucene 4.2.1?

Answers (2)

Related Questions