Reputation: 1404
I'm looking to assess similarity (including case) between two strings and give a value between 0 and 1.
I tried the Levenshtein distance implementation but it only gives integers and does not compare inner alphabets.
For e.g. comparing "ABCD" and "Abcd" gives distance of 3 and "AOOO" also gives a distance of 3 but clearly "Abcd" is better match than "AOOO".
So compared to "ABCD" I want "ABcd" to be most similar then "Abcd" then "AOOO" then "AOOOO"
I've also looked here but I am not looking for a variable length algorithm.
Thanks
Upvotes: 1
Views: 3236
Reputation: 31
bool check(string[] a, string s)
{
for (int i = 0; i < a.Length; i++)
if (s == a[i])
return true;
return false;
}
public double simi(string string1, string string2)
{
int sub1 = 0;
int sub2 = 0;
string[] sp1 = new string[string1.Length - 1];
string[] sp2 = new string[string2.Length - 1];
string[] sp3 = new string[string1.Length - 1];
string[] sp4 = new string[string2.Length - 1];
for (int i = 0; i < string1.Length - 1; i++)
{
string x = "";
x = string1.Substring(i, 2);
sp1[sub1] = x;
++sub1;
}
for (int i = 0; i < string2.Length - 1; i++)
{
string x = "";
x = string2.Substring(i, 2);
sp2[sub2] = x;
++sub2;
}
int j = 0, k = 0;
for (int i = 0; i < sp1.Length; i++)
if (check(sp3, sp1[i]) == true)
{
continue;
}
else
{
sp3[j] = sp1[i];
j++;
}
for (int i = 0; i < sp2.Length; i++)
if (check(sp4, sp2[i]) == true)
{
continue;
}
else
{
sp4[k] = sp2[i];
k++;
}
Array.Resize(ref sp3, j);
Array.Resize(ref sp4, k);
Array.Sort<string>(sp3);
Array.Sort<string>(sp4);
int n = 0;
for (int i = 0; i < sp3.Length; i++)
{
if (check(sp4, sp3[i]))
{
n++;
}
}
double resulte;
int l1 = sp3.Length;
int l2 = sp4.Length;
resulte = ((2.0 * Convert.ToDouble(n)) / Convert.ToDouble(l1 + l2)) * 100;
return resulte;
}
Upvotes: 2
Reputation: 2579
Adapt Levenshtein Distance with a custom table T. Let the cost of insertion = 1. The cost of deletion also 1. Let T(c,d) denote the penalty of replacing c with d. T(c,c) should be = 0. T(c,d) should be <= 2.
Define Max(n,m) be the maximum theoretical distance of strings of length n and m. Obviously, Max(n,m) = n+m.
Define Distance(s,t) be the cost of changing s to t divided by Max(s,t). There you go.
Be careful in defining T so that the definition obeys distance axioms:
Then it will be more useful in more situations.
Upvotes: 2
Reputation: 112259
Try something like this
double d = (LevenshteinDist(s, t) + LevenshteinDist(s.ToLower(), t.ToLower())) /
2.0 * Math.Max(s.Length, t.Length);
If you want to give less importance to case differences than letter differences, you can give different weights to the terms
double d = (0.15*LevenshteinDist(s, t) +
0.35*LevenshteinDist(s.ToLower(), t.ToLower())) /
Math.Max(s.Length, t.Length);
Note that the weights sum up to 0.5, thus makting the division by 2.0 obsolete.
Upvotes: 5