Dan Dinu
Dan Dinu

Reputation: 33438

Fuzzy substring matching in a string Lucene.NET

I've just installed Lucene.NET.

I'm doing a text search. I want to check if a large text contains / fuzzy matches a word/phrase, say:

Eg1:

text: "I posted a question about Lucene.NET on stackoverflow. Will I get an asnwer?"

textToSearch: "posted a question abot Lucene"

These 2 should be a match, since text contains textToSearch (except the samll typo abot -> about.

Is this possible with Lucene.NET library?

If not, does it support at least single word fuzzy matching in a text?

Eg:

text: "I posted a question on stackoverflow"

textToSearch "stackovrlow" (missspelled)

Upvotes: 1

Views: 1144

Answers (2)

femtoRgon
femtoRgon

Reputation: 33351

Mysterion is right, that Span Queries can be helpful, but wrong in saying the FuzzyQueries cannot be used. That is what SpanMultiTermQueryWrapper is for. Something like this:

SpanQuery query = new SpanNearQuery.Builder("myField", true).
    .addClause(new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("myField", "question"))))
    .addClause(new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("myField", "abot"))))
    .addClause(new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("myField", "lucene"))))
    .build();

Remember, by the way, when manually constructing queries (rather than using a queryparser), you need to take analysis into account, as they will not be run through an analyzer.

Upvotes: 1

Mysterion
Mysterion

Reputation: 9320

Yes, Lucene.Net supports single fuzzy matching in a text. You could do it with the help of FuzzyQuery.

Unfortunately, you could only somehow mimic the behavior of the first example. One possible solution is to create big BooleanQuery with each clause being FuzzyQuery, but this will lead to poor performance, as well, as it will lose the order of the terms.

Another possibility is to use SpanNearQuery which will help to store the positions (you could specify the needed slop), but there is no possibility to add FuzzyQuery as clauses (you could only somehow try to use SpanRegexQuery)

Upvotes: 1

Related Questions