Reputation: 61
Is there any function in C# that check the % of similarity of two strings?
For example i have:
var string1="Hello how are you doing";
var string2= " hi, how are you";
and the
function(string1, string2)
will return similarity ratio because the words "how", "are", "you" are present in the line.
Or even better, return me 60% of similarity because "how", "are", "you" is a 3/5 of string1.
Does any function exist in C# which do that?
Upvotes: 6
Views: 4078
Reputation: 3160
Now i am going to risk a -1 here for my suggestions, but in situations where you are trying to get something which is close but not so complex, then there is a lot of simpler solutions then the Levenshtein distance, which is perfect if you need exakt results and have time to code it.
If you are a bit looser concerning the accuracy, then i would follow this simple rules:
(strSearch == strReal)
- if match exitremove vowels and other chars from strings [aeiou-"!]
now you have two converted strings. your search string:
mths dhlgrn mtbrn
and your real string to compare to
rstrnt mths dhlgrn
compare the converted strings, if they match exit
\W+
33
66
- so 66% matchThis method is simple and extendable to go more and more in detail, actually you could use steps 1-7 and if step 7 returns anything above 50% then you figure you have a match, and otherwise you use more complex calculations.
ok, now don't -1 me too fast, because other answers are perfect, this is just a solution for lazy developers and might be of value there, where the result fulfills the expectations.
Upvotes: 3
Reputation: 17605
A common measure for similarity of strings is the so-called Levenshtein distance or edit distance. In this approach, a certain defined set of edit operation is defined. The Levenshtein distance is the minimum number of edit steps which is necessary to obtain the second string from the first. Closely related is the Damerau-Levenshtein distance, which uses a different set of edit operations.
Algorithmically, the Levenshtein distance can be calculated using Dynamic programming, which can be considered efficient. However, note that this approach does not actually take single words into account and cannot directly express the similarity in percent.
Upvotes: 5
Reputation: 180
You can create a function that splits both strings into arrays, and then iterate over one of them to check if the word exists in the other one.
If you want percentage of it you would have to count total amount of words and see how many are similar and create a number based on that.
Upvotes: 0