OstrichGlue
OstrichGlue

Reputation: 345

Split string by array of strings, and include words used to split in final array in C#

I'm trying to split a string into an array around words in a string array. Right now, I'm using myString.Split(arrayOfWordsToSplitOn, StringSplitOptions.RemoveEmptyEntries), which splits the string, but doesn't include the actual word that it is splitting on.

For example, if I have the string "My cat and my dog are very lazy", and a string array {"cat", "dog"}, right now it returns {"My", "and my", "are very lazy"}.

However, I would like to have the final output be {"My", "cat", "and my", "dog", "are very lazy"}. Is there any way to do this?

Upvotes: 1

Views: 74

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626806

You may create an alternation based regex out of your list of search words, and wrap that part with a capturing group, (...). Then, add \s* to strip the whitespaces around the group and use Regex.Split:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var arrayOfWordsToSplitOn = new List<string> { "cat", "dog" };
        var s = "My cat and my dog are very lazy";
        var pattern = string.Format(@"\s*\b({0})\b\s*", string.Join("|", arrayOfWordsToSplitOn));
        var results = Regex.Split(s, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
        foreach (var res in results)
            Console.WriteLine(res);
    }
}

See the C# demo.

Results:

My
cat
and my
dog
are very lazy

NOTES:

  • If the search words can contain non-word chars, the pattern should be adjusted as \b (word boundaries) might fail the match, and the search "words" will have to be Regex.Escaped
  • The search word array might need sorting by length and alphabet if you decide to drop word boundaries.

Upvotes: 4

Related Questions