Silver Sine
Silver Sine

Reputation: 61

How to check a partial similarity of two strings in C#

Is there any function in C# that check the % of similarity of two strings?

For example i have:

var string1="Hello how are you doing"; 
var string2= " hi, how are you";

and the

function(string1, string2) 

will return similarity ratio because the words "how", "are", "you" are present in the line.

Or even better, return me 60% of similarity because "how", "are", "you" is a 3/5 of string1.

Does any function exist in C# which do that?

Upvotes: 6

Views: 4078

Answers (3)

hexerei software
hexerei software

Reputation: 3160

Now i am going to risk a -1 here for my suggestions, but in situations where you are trying to get something which is close but not so complex, then there is a lot of simpler solutions then the Levenshtein distance, which is perfect if you need exakt results and have time to code it.

If you are a bit looser concerning the accuracy, then i would follow this simple rules:

  1. compare literal first (strSearch == strReal) - if match exit
  2. convert search string and real string to lowercase
  3. remove vowels and other chars from strings [aeiou-"!]

    now you have two converted strings. your search string:

    mths dhlgrn mtbrn
    

    and your real string to compare to

    rstrnt mths dhlgrn
    
  4. compare the converted strings, if they match exit

  5. split only the search strings by its words either with simple split function or using Regular Expressions \W+
  6. calculate the virtual value (weight) of one part by dividing 100 by the number of parts - in this case 33
  7. compare each part of the search string with the real string, if it is contained, and add the value for each match to your total weight. In this case we have three elements and two matches so the result is 66 - so 66% match

This method is simple and extendable to go more and more in detail, actually you could use steps 1-7 and if step 7 returns anything above 50% then you figure you have a match, and otherwise you use more complex calculations.

ok, now don't -1 me too fast, because other answers are perfect, this is just a solution for lazy developers and might be of value there, where the result fulfills the expectations.

Upvotes: 3

Codor
Codor

Reputation: 17605

A common measure for similarity of strings is the so-called Levenshtein distance or edit distance. In this approach, a certain defined set of edit operation is defined. The Levenshtein distance is the minimum number of edit steps which is necessary to obtain the second string from the first. Closely related is the Damerau-Levenshtein distance, which uses a different set of edit operations.

Algorithmically, the Levenshtein distance can be calculated using Dynamic programming, which can be considered efficient. However, note that this approach does not actually take single words into account and cannot directly express the similarity in percent.

Upvotes: 5

Slugge
Slugge

Reputation: 180

You can create a function that splits both strings into arrays, and then iterate over one of them to check if the word exists in the other one.

If you want percentage of it you would have to count total amount of words and see how many are similar and create a number based on that.

Upvotes: 0

Related Questions