Reputation: 5095
i am looking for a longest common words c# implementation. Most of the samples i have came across are comparing character by character.
in otherwords,
string1 = access
string2 = advised
should return null output from the function
any sample codes?
Upvotes: 1
Views: 2406
Reputation: 9190
Finding differences in strings is called the Longest Common Subsequence problem. The following is a generic solution to the LCS problem, written in C#:
static int[,] GetLCSDifferenceMatrix<T>(
Collection<T> baseline,
Collection<T> revision)
{
int[,] matrix = new int[baseline.Count + 1, revision.Count + 1];
for (int baselineIndex = 0; baselineIndex < baseline.Count; baselineIndex++)
{
for (int revisionIndex = 0; revisionIndex < revision.Count; revisionIndex++)
{
if (baseline[baselineIndex].Equals(revision[revisionIndex]))
{
matrix[baselineIndex + 1, revisionIndex + 1] =
matrix[baselineIndex, revisionIndex] + 1;
}
else
{
int possibilityOne = matrix[baselineIndex + 1, revisionIndex];
int possibilityTwo = matrix[baselineIndex, revisionIndex + 1];
matrix[baselineIndex + 1, revisionIndex + 1] =
Math.Max(possibilityOne, possibilityTwo);
}
}
}
return matrix;
}
This code gives you a "difference" matrix, which can then be used to construct the difference from the two inputs. For unit tests and example usage, see http://sethflowers.com/2012/01/18/basic-diff-with-a-generic-solution-to-the-longest-common-subsequence-problem.html.
Upvotes: 1
Reputation: 660455
Turning the algorithm which computes LCS of arrays of characters into one that does it to arrays of anything else -- like, say, an array of words -- is usually pretty straightforward. Have you tried that?
If you need some hints, here's an article I wrote a couple years ago on how to implement Longest Common Subsequence on an array of words in JScript. You should be able to adapt it to C# without too much difficulty.
http://blogs.msdn.com/ericlippert/archive/2004/07/21/189974.aspx
Upvotes: 1
Reputation: 25593
If by word you mean these letter things, seperated from the others by punktuation, try this:
private String longestCommonWord(String s1, String s2)
{
String[] seperators = new String[] { " ", ",", ".", "!", "?", ";" };
var result = from w1 in s1.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
where (from w2 in s2.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
where w2 == w1
select w2).Count() > 0
orderby w1.Length descending
select w1;
if (result.Count() > 0)
{
return result.First();
}
else
{
return null;
}
}
This probably is not the most elegant way to do it, but it works for me. =)
Upvotes: 1
Reputation: 12699
I think this problem is usually referred to as the Longest common substring problem. The Wikipedia article contains pseudocode, and C# implementations can be found on the Web.
Upvotes: 2