Rohit Banga
Rohit Banga

Reputation: 18916

Alternatives to Lucene Default Fuzzy Matching Implementation

Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching. Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identify homphones also. Also please compare various fuzzy matching approaches for lucene.

Upvotes: 1

Views: 1103

Answers (2)

Chance Alexander
Chance Alexander

Reputation: 1

Something that I've been doing is pretty simple, and works in most scenarios (In my scenario, I have 6.7 million event names, from a dirty table that has slightly altered or drilled-down versions of event names, and the table I'm fuzzy matching with has all the clean event names)

``select distinct a.Column, b.Column 
from tableA a 
inner join tableB b 
on '%' + SUBSTRING(b.Column, x, y) + '%' = '%' + SUBSTRING(a.Column, x, y) + '%'
order by a.Column asc;``

My problem is that if I simply did a fuzzy match with no substring, I was only getting about 11 results because of how obscure the naming conventions between the two were. This solution shows all of the drill-down-esque events being matched up with their broader counterparts in the clean table.

Upvotes: 0

Mikos
Mikos

Reputation: 8553

Don't think Lucene offers any other string matching algorithms, you can however add one yourself. Here is a good library that contains most well known string comparison algorithms.

Upvotes: 1

Related Questions