Reputation: 1351
I'd like to implement a searchable index using Lucene.Net 4.8 that supplies a user with suggestions / autocomplete for single words & phrases.
The index has been created successfully; the suggestions are where I've stalled.
Version 4.8 seems to have introduced a substantial number of breaking changes, and none of the available samples I've found work.
For reference, LuceneVersion
is this:
private readonly LuceneVersion LuceneVersion = LuceneVersion.LUCENE_48;
Solution 1
I've tried this, but can't get past reader.Terms
:
public void TryAutoComplete()
{
var analyzer = new EnglishAnalyzer(LuceneVersion);
var config = new IndexWriterConfig(LuceneVersion, analyzer);
RAMDirectory dir = new RAMDirectory();
using (IndexWriter iw = new IndexWriter(dir, config))
{
Document d = new Document();
TextField f = new TextField("text","",Field.Store.YES);
d.Add(f);
f.SetStringValue("abc");
iw.AddDocument(d);
f.SetStringValue("colorado");
iw.AddDocument(d);
f.SetStringValue("coloring book");
iw.AddDocument(d);
iw.Commit();
using (IndexReader reader = iw.GetReader(false))
{
TermEnum terms = reader.Terms(new Term("text", "co"));
int maxSuggestsCpt = 0;
// will print:
// colorado
// coloring book
do
{
Console.WriteLine(terms.Term.Text);
maxSuggestsCpt++;
if (maxSuggestsCpt >= 5)
break;
}
while (terms.Next() && terms.Term.Text.StartsWith("co"));
}
}
}
reader.Terms
no longer exists. Being new to Lucene, it's unclear how to refactor this.
Solution 2
Trying this, I'm thrown an error:
public void TryAutoComplete2()
{
using(var analyzer = new EnglishAnalyzer(LuceneVersion))
{
IndexWriterConfig config = new IndexWriterConfig(LuceneVersion, analyzer);
RAMDirectory dir = new RAMDirectory();
using(var iw = new IndexWriter(dir,config))
{
Document d = new Document()
{
new TextField("text", "this is a document with a some words",Field.Store.YES),
new Int32Field("id", 42, Field.Store.YES)
};
iw.AddDocument(d);
iw.Commit();
using (IndexReader reader = iw.GetReader(false))
using (SpellChecker speller = new SpellChecker(new RAMDirectory()))
{
//ERROR HERE!!!
speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false);
string[] suggestions = speller.SuggestSimilar("dcument", 5);
IndexSearcher searcher = new IndexSearcher(reader);
foreach (string suggestion in suggestions)
{
TopDocs docs = searcher.Search(new TermQuery(new Term("text", suggestion)), null, Int32.MaxValue);
foreach (var doc in docs.ScoreDocs)
{
System.Diagnostics.Debug.WriteLine(searcher.Doc(doc.Doc).Get("id"));
}
}
}
}
}
}
When debugging, speller.IndexDictionary(new LuceneDictionary(reader, "text"), config, false);
throws a The object cannot be set twice!
error, which I can't explain.
Any thoughts are welcome.
I'd like to return a list of suggested terms for a given input, not the documents or their full content.
For example, if a document contains "Hello, my name is Clark. I'm from Atlanta," and I submit "Atl," then "Atlanta" should come back as a suggestion.
Upvotes: 2
Views: 1837
Reputation: 1481
If I am understanding you correctly you may be over-complicating your index design a bit. If your goal is to use Lucene for auto-complete, you want to create an index of the terms you consider complete. Then simply query the index using a PrefixQuery using a partial word or phrase.
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.En;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Util;
using System;
using System.Linq;
namespace LuceneDemoApp
{
class LuceneAutoCompleteIndex : IDisposable
{
const LuceneVersion Version = LuceneVersion.LUCENE_48;
RAMDirectory Directory;
Analyzer Analyzer;
IndexWriterConfig WriterConfig;
private void IndexDoc(IndexWriter writer, string term)
{
Document doc = new Document();
doc.Add(new StringField(FieldName, term, Field.Store.YES));
writer.AddDocument(doc);
}
public LuceneAutoCompleteIndex(string fieldName, int maxResults)
{
FieldName = fieldName;
MaxResults = maxResults;
Directory = new RAMDirectory();
Analyzer = new EnglishAnalyzer(Version);
WriterConfig = new IndexWriterConfig(Version, Analyzer);
WriterConfig.OpenMode = OpenMode.CREATE_OR_APPEND;
}
public string FieldName { get; }
public int MaxResults { get; set; }
public void Add(string term)
{
using (var writer = new IndexWriter(Directory, WriterConfig))
{
IndexDoc(writer, term);
}
}
public void AddRange(string[] terms)
{
using (var writer = new IndexWriter(Directory, WriterConfig))
{
foreach (string term in terms)
{
IndexDoc(writer, term);
}
}
}
public string[] WhereStartsWith(string term)
{
using (var reader = DirectoryReader.Open(Directory))
{
IndexSearcher searcher = new IndexSearcher(reader);
var query = new PrefixQuery(new Term(FieldName, term));
TopDocs foundDocs = searcher.Search(query, MaxResults);
var matches = foundDocs.ScoreDocs
.Select(scoreDoc => searcher.Doc(scoreDoc.Doc).Get(FieldName))
.ToArray();
return matches;
}
}
public void Dispose()
{
Directory.Dispose();
Analyzer.Dispose();
}
}
}
Running this:
var indexValues = new string[] { "apple fruit", "appricot", "ape", "avacado", "banana", "pear" };
var index = new LuceneAutoCompleteIndex("fn", 10);
index.AddRange(indexValues);
var matches = index.WhereStartsWith("app");
foreach (var match in matches)
{
Console.WriteLine(match);
}
You get this:
apple fruit
appricot
Upvotes: 2