Reputation: 679
I need to index bi-grams of words (tokens) in Lucene. I can produce n-grams and than index them, but I am wondering if there is something in Lucene which will do this for me. I found out that Lucene indexes only n-gram of chars. Any ideas?
Upvotes: 6
Views: 2344
Reputation: 11
The class that you are looking for is the ShingleFilter: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html
Upvotes: 1
Reputation: 71939
Depending on why you need to index bi-grams, SpanQuery and/or SnowballAnalyzer may be helpful.
Upvotes: 0