Reputation: 342
Lets suppose i have two strings string1
and string2
.
var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
I need to find biggest substring of string1
which matches in string2
.
here biggest substring will be "canadian geese"
which is matching in string2
.
How can I find it? I tried breaking string1
into char[]
and find words then merged matched words but that failed my objective.
Upvotes: 2
Views: 195
Reputation: 1369
I'm going to add one more, using span/readonlymemory so you can avoid allocating all the strings that the current answers create. Note I didn't do any check for starting space or ending space as that does not seem to be a requirement for the question. This does do a case insensitive search, if you don't want that you can make it more efficient by using the built in indexof and dropping the case insensitive compares.
static void Main(string[] _)
{
var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
var longest = FindLongestMatchingSubstring(string1, string2);
Console.WriteLine(longest);
}
static string FindLongestMatchingSubstring(string lhs, string rhs)
{
var left = lhs.AsMemory();
var right = rhs.AsMemory();
ReadOnlyMemory<char> longest = ReadOnlyMemory<char>.Empty;
for (int i = 0; i < left.Length; ++i)
{
foreach (var block in FindMatchingSubSpans(left, i, right))
{
if (block.Length > longest.Length)
longest = block;
}
}
if (longest.IsEmpty)
return string.Empty;
return longest.ToString();
}
static IEnumerable<ReadOnlyMemory<char>> FindMatchingSubSpans(ReadOnlyMemory<char> source, int pos, ReadOnlyMemory<char> matchFrom)
{
int lastMatch = 0;
for (int i = pos; i < source.Length; ++i)
{
var ch = source.Span[i];
int match = IndexOfChar(matchFrom, lastMatch, ch);
if (-1 != match)
{
lastMatch = match + 1;
int end = i;
while (++end < source.Length && ++match < matchFrom.Length)
{
char lhs = source.Span[end];
char rhs = matchFrom.Span[match];
if (lhs != rhs && lhs != (char.IsUpper(rhs) ? char.ToLower(rhs) : char.ToUpper(rhs)))
{
break;
}
}
yield return source.Slice(i, end - i);
}
}
}
static int IndexOfChar(ReadOnlyMemory<char> source, int pos, char ch)
{
char alt = char.IsUpper(ch) ? char.ToLower(ch) : char.ToUpper(ch);
for (int i = pos; i < source.Length; ++i)
{
char m = source.Span[i];
if (m == ch || m == alt)
return i;
}
return -1;
}
Upvotes: 1
Reputation: 17382
Have a look at the following code https://dotnetfiddle.net/aPyw3o
public class Program {
static IEnumerable<string> substrings(string s, int length) {
for (int i = 0 ; i + length <= s.Length; i++) {
var ss = s.Substring(i, length);
if (!(ss.StartsWith(" ") || ss.EndsWith(" ")))
yield return ss;
}
}
public static void Main()
{
int count = 0;
var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
string result = null;
for (int i = string1.Length; i>0 && string.IsNullOrEmpty(result); i--) {
foreach (string s in substrings(string1, i)) {
count++;
if (string2.IndexOf(s, StringComparison.CurrentCultureIgnoreCase) >= 0) {
result = s;
break;
}
}
}
if (string.IsNullOrEmpty(result))
Console.WriteLine("no common substrings found");
else
Console.WriteLine("'" + result + "'");
Console.WriteLine(count);
}
}
The substrings
method returns all substrings of the string s
with the length of length
(for the yield
have a look at the documentation https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield) We skip substrings that start or end with a space, as we don't want whitespaces to make a substring longer than it really is)
The outer loop iterates through all the possible length values for the substrings, from the longest (ie string1.Length
) to the shortest (ie 1
). Then for each of the found substrings of length i
is checked, if it's also a substring of string2
. If that's the case, we can stop, as there cannot be any longer common substring, because we checked all longer substrings in previous iterations. But of course there may be other common substrings with length i
Upvotes: 2
Reputation: 45947
classy loop approach - the result includes te space after geese "canadian geese "
var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
string result = "";
for (int i = 0; i < string1.Length; i++)
{
for (int j = 0; j < string1.Length - i; j++)
{
//add .Trim() here if you want to ignore space characters
string searchpattern = string1.Substring(i, j);
if (string2.IndexOf(searchpattern, StringComparison.OrdinalIgnoreCase) > -1 && searchpattern.Length > result.Length)
{
result = searchpattern;
}
}
}
https://dotnetfiddle.net/q3rHjI
Side note:
canadian
and Canadian
are not equal so you have to use StringComparison.OrdinalIgnoreCase
if you want to search case insensitive
Upvotes: 4