Reputation: 18916
Lucene fuzzy matching uses a basic editDistance algorithm to implement fuzzy matching. Are there other implementations of fuzzy matching for Lucene which use other similarity metrics? They should identify homphones also. Also please compare various fuzzy matching approaches for lucene.
Upvotes: 1
Views: 1103
Reputation: 1
Something that I've been doing is pretty simple, and works in most scenarios (In my scenario, I have 6.7 million event names, from a dirty table that has slightly altered or drilled-down versions of event names, and the table I'm fuzzy matching with has all the clean event names)
``select distinct a.Column, b.Column
from tableA a
inner join tableB b
on '%' + SUBSTRING(b.Column, x, y) + '%' = '%' + SUBSTRING(a.Column, x, y) + '%'
order by a.Column asc;``
My problem is that if I simply did a fuzzy match with no substring, I was only getting about 11 results because of how obscure the naming conventions between the two were. This solution shows all of the drill-down-esque events being matched up with their broader counterparts in the clean table.
Upvotes: 0