Fred
Fred

Reputation: 75

Get all variations of a string using Levenshtein distance

I found a lot of implementation doing the calculation of Levenshtein between 2 strings, but is there any implementation that can generate all variations using Levenshtein distance (max 2) for one given string.

The reason is, I'm using ElasticSearch to execute some fuzzy search, but with the load of queries that I have I have some performance issue because ELK will calculate those possibilities each time, I want to store those values once.

Upvotes: 1

Views: 1358

Answers (1)

bad_coder
bad_coder

Reputation: 12870

The most commonly cited reference implementation for generating an edit distance is in Python, you can see it in this answer.

The original author linked subsequent implementations in other languages at the bottom of his blog under the heading Other Computer Languages. There are 4 implementations in C#, this one in particular is functional (I'm unsure under what license those implementations are published, so I won't transcribe them into this thread).

But using wildcard searches with ElasticSearch is the correct approach. The engine will implement approximate string matching as efficiently as possible - there are a number of different algorithms this can be based on and the optimal choice depends on your data structure, etc.

You can simplify the use by generating the edit distance yourself, but in most cases if you're using a database or engine their implementation will have better performance. (This is a computationally expensive task, there's no way around that.)

Upvotes: 1

Related Questions