How to use Lucene library to extract n-grams?

Question

I am having a rough time trying to wrap my head around the Lucene library. This is what I have so far:

public void shingleMe()
{

    try
    {
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
        FileReader reader = new FileReader("test.txt");

        ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(analyzer, 2);
        shingleAnalyzer.setOutputUnigrams(false);

        TokenStream stream = shingleAnalyzer.tokenStream("contents", reader);
        CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);         

        while (stream.incrementToken())
        {
            System.out.println(charTermAttribute.toString());
        }

    }

    catch (FileNotFoundException e)
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    catch (IOException e)
    {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

}

It fails at stream.incrementToken(). It's my understanding that the ShingleAnalyzerWrapper uses another Analyzer to create a shingle analyzer object. From there, I convert it to a token stream which is then parsed using an attribute filter. However, it always results in this exception:

Exception in thread "main" java.lang.AbstractMethodError: org.apache.lucene.analysis.TokenStream.incrementToken()Z

Thoughts? Thanks in advance!

Marko Topolnik · Accepted Answer

AbstractMethodError cannot occur as a result of wrong API usage -- it must be the result of compiling against one JAR and then running against a different one. Since you are using both Lucene Core and Lucene Analyzers JAR here, double-check your compile-time and runtime JAR classpaths.

How to use Lucene library to extract n-grams?

Answers (1)

Related Questions