Reputation: 1131
I am trying to achieve fuzzy phrase search (to match misspelled words) by using lucene, by referring various blogs I thought to try ngram indexes on fuzzy phrase search.
But I couldn't find ngram tokenizer as part of my lucene3.4 JAR library, is it deprecated and replaced with something else ? - currently I am using standardAnalyzer where I am getting decent results for exact match of terms.
I have below two requirements to handle.
My index is having document with phrase "xyz abc pqr", when I provide query "abc xyz"~5, I am able to get results, but my requirement is to get results for same document even though I have one extra word like "abc xyz pqr tst" in my query (I understand match score will be little less) - using proximity extra word in phrase is not working, if I remove proximity and double quotes " " from my query, I am getting expected results (but there I get many false positives like documents containing only xyz, only abc etc.)
In same above example, if somebody misspell query "abc xxz", I still want to get results for same document.
I want to give a try with ngram but not sure it will work as expected.
Any thoughts ?
Upvotes: 2
Views: 2313
Reputation: 66
Try to use BooleanQuery
and FuzzyQuery
like:
public void fuzzysearch(String querystr) throws Exception{
querystr=querystr.toLowerCase();
System.out.println("\n\n-------- Start fuzzysearch -------- ");
// 3. search
int hitsPerPage = 10;
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
BooleanQuery bq = new BooleanQuery();
String[] searchWords = querystr.split(" ") ;
int id=0;
for(String word: searchWords ){
Query query = new FuzzyQuery(new Term(NAME,word));
if(id==0){
bq.add(query, BooleanClause.Occur.MUST);
}else{
bq.add(query, BooleanClause.Occur.SHOULD);
}
id++;
}
System.out.println("query ==> " + bq.toString());
searcher.search(bq, collector );
parseResults( searcher, collector ) ;
searcher.close();
}
public void parseResults(IndexSearcher searcher, TopScoreDocCollector collector ) throws Exception {
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get(NAME));
}
}
Upvotes: 5