sarvesh kushwaha
sarvesh kushwaha

Reputation: 342

How to find biggest substring from string1 into string2

Lets suppose i have two strings string1 and string2.

var string1 = "images of canadian geese goslings";

var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

I need to find biggest substring of string1 which matches in string2.

here biggest substring will be "canadian geese" which is matching in string2.

How can I find it? I tried breaking string1 into char[] and find words then merged matched words but that failed my objective.

Upvotes: 2

Views: 195

Answers (3)

MikeJ
MikeJ

Reputation: 1369

I'm going to add one more, using span/readonlymemory so you can avoid allocating all the strings that the current answers create. Note I didn't do any check for starting space or ending space as that does not seem to be a requirement for the question. This does do a case insensitive search, if you don't want that you can make it more efficient by using the built in indexof and dropping the case insensitive compares.

    static void Main(string[] _)
    {
        var string1 = "images of canadian geese goslings";

        var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

        var longest = FindLongestMatchingSubstring(string1, string2);

        Console.WriteLine(longest);
    }

    static string FindLongestMatchingSubstring(string lhs, string rhs)
    {
        var left = lhs.AsMemory();
        var right = rhs.AsMemory();

        ReadOnlyMemory<char> longest = ReadOnlyMemory<char>.Empty;

        for (int i = 0; i < left.Length; ++i)
        {
            foreach (var block in FindMatchingSubSpans(left, i, right))
            {
                if (block.Length > longest.Length)
                    longest = block;
            }
        }

        if (longest.IsEmpty)
            return string.Empty;

        return longest.ToString();
    }

    static IEnumerable<ReadOnlyMemory<char>> FindMatchingSubSpans(ReadOnlyMemory<char> source, int pos, ReadOnlyMemory<char> matchFrom)
    {
        int lastMatch = 0;

        for (int i = pos; i < source.Length; ++i)
        {
            var ch = source.Span[i];

            int match = IndexOfChar(matchFrom, lastMatch, ch);

            if (-1 != match)
            {
                lastMatch = match + 1;

                int end = i;

                while (++end < source.Length && ++match < matchFrom.Length)
                {
                    char lhs = source.Span[end];
                    char rhs = matchFrom.Span[match];

                    if (lhs != rhs && lhs != (char.IsUpper(rhs) ? char.ToLower(rhs) : char.ToUpper(rhs)))
                    {
                        break;
                    }
                }

                yield return source.Slice(i, end - i);
            }
        }
    }

    static int IndexOfChar(ReadOnlyMemory<char> source, int pos, char ch)
    {
        char alt = char.IsUpper(ch) ? char.ToLower(ch) : char.ToUpper(ch);

        for (int i = pos; i < source.Length; ++i)
        {
            char m = source.Span[i];

            if (m == ch || m == alt)
                return i;
        }

        return -1;
    }

Upvotes: 1

derpirscher
derpirscher

Reputation: 17382

Have a look at the following code https://dotnetfiddle.net/aPyw3o

public class Program {

static IEnumerable<string> substrings(string s, int length) {
    for (int i = 0 ; i + length <= s.Length; i++) {
        var ss = s.Substring(i, length);
        if (!(ss.StartsWith(" ") || ss.EndsWith(" ")))
            yield return ss;
    }
}

public static void Main()
{
    int count = 0;
    var string1 = "images of canadian geese goslings";
    var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";
    string result = null;
    for (int i = string1.Length; i>0 && string.IsNullOrEmpty(result); i--) {
        foreach (string s in substrings(string1, i)) {
            count++;
            if (string2.IndexOf(s, StringComparison.CurrentCultureIgnoreCase) >= 0) {
                result = s;
                break;
            }
        }
    }
    if (string.IsNullOrEmpty(result)) 
        Console.WriteLine("no common substrings found");
    else 
        Console.WriteLine("'" + result + "'");
    Console.WriteLine(count);
}

}   

The substrings method returns all substrings of the string s with the length of length (for the yield have a look at the documentation https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/yield) We skip substrings that start or end with a space, as we don't want whitespaces to make a substring longer than it really is)

The outer loop iterates through all the possible length values for the substrings, from the longest (ie string1.Length) to the shortest (ie 1). Then for each of the found substrings of length i is checked, if it's also a substring of string2. If that's the case, we can stop, as there cannot be any longer common substring, because we checked all longer substrings in previous iterations. But of course there may be other common substrings with length i

Upvotes: 2

fubo
fubo

Reputation: 45947

classy loop approach - the result includes te space after geese "canadian geese "

var string1 = "images of canadian geese goslings";
var string2 = "Canadian geese with goslings pictures to choose from, with no signup needed";

string result = "";

for (int i = 0; i < string1.Length; i++)
{
    for (int j = 0; j < string1.Length - i; j++)
    {
        //add .Trim() here if you want to ignore space characters
        string searchpattern = string1.Substring(i, j);
        if (string2.IndexOf(searchpattern,  StringComparison.OrdinalIgnoreCase) > -1 && searchpattern.Length > result.Length)
        {
            result = searchpattern;
        }
    }
}

https://dotnetfiddle.net/q3rHjI

Side note: canadian and Canadian are not equal so you have to use StringComparison.OrdinalIgnoreCase if you want to search case insensitive

Upvotes: 4

Related Questions