Pece
Pece

Reputation: 637

Find out how much percent one string contains in another

I need to find out how much percentage or chars does one string contains into another string. I've tried Levenshtein Distance but that algorithm returns how much char's are needed to be change for the strings to be equal. Can some one help? I need it in c# but that's not so important.

The answer code: public double LongestCommonSubsequence(string s1, string s2) { //if either string is empty, the length must be 0 if (String.IsNullOrEmpty(s1) || String.IsNullOrEmpty(s2)) return 0;

    int[,] num = new int[s1.Length, s2.Length];  //2D array
    char letter1;
    char letter2;

    //Actual algorithm
    for (int i = 0; i < s1.Length; i++)
    {
        letter1 = s1[i];
        for (int j = 0; j < s2.Length; j++)
        {
            letter2 = s2[j];

            if (letter1 == letter2)
            {
                if ((i == 0) || (j == 0))
                    num[i, j] = 1;
                else
                    num[i, j] = 1 + num[i - 1, j - 1];
            }
            else
            {
                if ((i == 0) && (j == 0))
                    num[i, j] = 0;
                else if ((i == 0) && !(j == 0))   //First ith element
                    num[i, j] = Math.Max(0, num[i, j - 1]);
                else if (!(i == 0) && (j == 0))   //First jth element
                    num[i, j] = Math.Max(num[i - 1, j], 0);
                else // if (!(i == 0) && !(j == 0))
                    num[i, j] = Math.Max(num[i - 1, j], num[i, j - 1]);
            }
        }//end j
    }//end i
    return (s2.Length - (double)num[s1.Length - 1, s2.Length - 1]) / s1.Length * 100; 
} //end LongestCommonSubsequence

Upvotes: 4

Views: 2688

Answers (2)

MSN
MSN

Reputation: 54634

Uhh... can't you just use the number of characters that need to change then?

(length(destination)-changed_character_count)/ length(source)

EDIT: based on the revised question, treat both strings as sets, compute the set intersection, and base the percentage off the size of that set and the source string as a set.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 839074

It sounds like you might want the longest common subsequence which is the basis for diff algorithms. Unfortunately this problem is NP-hard which means there is no efficient (polynomial time) solution. The Wikipedia page has some suggestions.

Upvotes: 2

Related Questions