Astrophage
Astrophage

Reputation: 1439

c# Find all common substring in List<string>

How can I extract all words, which are common in a list of strings?

Example:

//Output = Bammler GOV
  "Bammler Tokyo SA GOV"
  "Zurich Bammler GOV"
  "London Bammler 12 GOV"
  "New Bammler York GOV"

I tried following:

    static void Main(string[] args)
    {
        List<string> MyStringList = new List<string>()
        {
            "Bammler Tokyo SA GOV",
            "Zurich Bammler GOV",
            "London Bammler 12 GOV",
            "New Bammler York GOV"
        };

        string shortest = MyStringList.OrderBy(s => s.Length).First();
        IEnumerable<string> shortestSubstrings = getAllSubstrings(shortest).OrderByDescending(s => s.Length);
        var other = MyStringList.Where(s => s != shortest).ToArray();
        string longestCommonIntersection = string.Empty;
        foreach (string subStr in shortestSubstrings)
        {
            bool allContains = other.All(s => s.Contains(subStr));
            if (allContains)
            {
                longestCommonIntersection = subStr;
                break;
            }
        }
    }

    public static IEnumerable<string> getAllSubstrings(string word)
    {
        return from charIndex1 in Enumerable.Range(0, word.Length)
               from charIndex2 in Enumerable.Range(0, word.Length - charIndex1 + 1)
               where charIndex2 >= 2
               select word.Substring(charIndex1, charIndex2);
    }

I found this here Find a common string within a list of strings but this will just extract for example "Bammler".

Upvotes: 0

Views: 1829

Answers (2)

ocuenca
ocuenca

Reputation: 39326

I would go with @Sergey answer, but I want to add you can also use a hash to get the intersection:

var list = new  List < string >{  "Bammler Tokyo SA GOV",
                                  "Zurich Bammler GOV",
                                  "London Bammler 12 GOV",
                                  "New Bammler York GOV"};

var hash = new HashSet<string> ( list.First().Split(' ') );
for (int i = 1; i < list.Count; i++)
    hash.IntersectWith(list[i].Split(' '));

Upvotes: 2

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236248

You can aggregate result of words intersection from all strings:

var result = MyStringList.Select(s => s.Split())
    .Aggregate(
         MyStringList[0].Split().AsEnumerable(), // init accum with words from first string
         (a, words) => a.Intersect(words),       // intersect with next set of words
         a => a);

Output:

[
  "Bammler",
  "GOV"
]

Upvotes: 4

Related Questions