kakopappa
kakopappa

Reputation: 5095

c# longest common words sample

i am looking for a longest common words c# implementation. Most of the samples i have came across are comparing character by character.

in otherwords,

string1 = access
string2 = advised 

should return null output from the function

any sample codes?

Upvotes: 1

Views: 2406

Answers (4)

Seth Flowers
Seth Flowers

Reputation: 9190

Finding differences in strings is called the Longest Common Subsequence problem. The following is a generic solution to the LCS problem, written in C#:

static int[,] GetLCSDifferenceMatrix<T>(
    Collection<T> baseline,
    Collection<T> revision)
{
    int[,] matrix = new int[baseline.Count + 1, revision.Count + 1];

    for (int baselineIndex = 0; baselineIndex < baseline.Count; baselineIndex++)
    {
        for (int revisionIndex = 0; revisionIndex < revision.Count; revisionIndex++)
        {
            if (baseline[baselineIndex].Equals(revision[revisionIndex]))
            {
                matrix[baselineIndex + 1, revisionIndex + 1] =
                    matrix[baselineIndex, revisionIndex] + 1;
            }
            else
            {
                int possibilityOne = matrix[baselineIndex + 1, revisionIndex];
                int possibilityTwo = matrix[baselineIndex, revisionIndex + 1];

                matrix[baselineIndex + 1, revisionIndex + 1] =
                    Math.Max(possibilityOne, possibilityTwo);
            }
        }
    }

    return matrix;
}

This code gives you a "difference" matrix, which can then be used to construct the difference from the two inputs. For unit tests and example usage, see http://sethflowers.com/2012/01/18/basic-diff-with-a-generic-solution-to-the-longest-common-subsequence-problem.html.

Upvotes: 1

Eric Lippert
Eric Lippert

Reputation: 660455

Turning the algorithm which computes LCS of arrays of characters into one that does it to arrays of anything else -- like, say, an array of words -- is usually pretty straightforward. Have you tried that?

If you need some hints, here's an article I wrote a couple years ago on how to implement Longest Common Subsequence on an array of words in JScript. You should be able to adapt it to C# without too much difficulty.

http://blogs.msdn.com/ericlippert/archive/2004/07/21/189974.aspx

Upvotes: 1

Jens
Jens

Reputation: 25593

If by word you mean these letter things, seperated from the others by punktuation, try this:

private String longestCommonWord(String s1, String s2)
    {
        String[] seperators = new String[] { " ", ",", ".", "!", "?", ";" };
        var result = from w1 in s1.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
                     where (from w2 in s2.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
                            where w2 == w1
                            select w2).Count() > 0
                     orderby w1.Length descending
                     select w1;
        if (result.Count() > 0)
        {
            return result.First();
        }
        else
        {
            return null;
        }
    }

This probably is not the most elegant way to do it, but it works for me. =)

Upvotes: 1

Ronald Zarīts
Ronald Zarīts

Reputation: 12699

I think this problem is usually referred to as the Longest common substring problem. The Wikipedia article contains pseudocode, and C# implementations can be found on the Web.

Upvotes: 2

Related Questions