Reputation: 845
I'd like to filter records with lambda expression. The condition is if it contains a given string, or nearly the same string with one character difference. Nearly the same means: one of the characters can be any character, but only one.
For example: The search string: 'ABC' then the condition must be: '[any char]BC' or 'A[any char]C' or 'AB[any char]'
Does anyone know any professional solution? Thanks in advance.
SOLUTION (thanks for LiamK):
var count = s1.Zip(s2, (c1, c2) => c1 == c2 ? 0 : 1).Sum();
Upvotes: 0
Views: 397
Reputation: 815
The metric you are looking for is referred to as the Levenshtein distance between two strings. You can create an implementation of this algorithm, then utilize it inside of your where clause:
public IEnumerable<string> MyFunc(string searchString)
{
return myThingToSearch.Where(x => LevenshteinDistance(x, searchString) <= 1);
}
public static int LevenshteinDistance(string s1, string s2)
{
if (s1 == s2)
{
return 0;
}
if (s1.Length == 0)
{
return s2.Length;
}
if (s2.Length == 0)
{
return s1.Length;
}
int[] v0 = new int[s2.Length + 1];
int[] v1 = new int[s2.Length + 1];
for (int i = 0; i < v0.Length; i++)
{
v0[i] = i;
}
for (int i = 0; i < s1.Length; i++)
{
v1[0] = i + 1;
for (int j = 0; j < s2.Length; j++)
{
var cost = (s1[i] == s2[j]) ? 0 : 1;
v1[j + 1] = Math.Min(v1[j] + 1, Math.Min(v0[j + 1] + 1, v0[j] + cost));
}
for (int j = 0; j < v0.Length; j++)
{
v0[j] = v1[j];
}
}
return v1[s2.Length];
}
Side note: The Levenshtein distance metric will also match strings like 'BC' or 'ABCD' to the search string 'ABC' since those strings are technically only 'off-by-one' from the search string. I'm not sure if this is acceptable within your spec or not. If not, then let us know. That problem is a subset of approximate string matching and you would use the Hamming distance instead
Upvotes: 1