gilliduck
gilliduck

Reputation: 2918

Split from one specific word to another specific word

Assuming a string like Foo: Some Text Bar: Some Other Text FooBar: Even More Text and a goal to have it split into:

Foo: Some Text
Bar: Some Other Text
FooBar: Even More Text

I can't figure out the Regex for it at all. I can split it based on the words I want like (Foo:)|(Bar:)|(FooBar:) but I can't figure out how to include from the beginning of each group to the beginning of the next group (or end of text if last group).

Upvotes: 2

Views: 81

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626784

You can use Regex.Split to split the string with

(?<!^)\s+(?=\b(?:Bar|Foo(?:Bar)?):)

See the regex demo. Details:

  • (?<!^) - not at the start of string
  • \s+ - 1 or more whitespaces
  • (?=\b(?:Bar|Foo(?:Bar)?):) - immediately to the right, there must be
    • \b - a word boundary
    • (?:Bar|Foo(?:Bar)?) - Bar, Foo or FooBar
    • : - a colon.

C# demo:

var s = "Foo: Some Text Bar: Some Other Text FooBar: Even More Text";
var res = Regex.Split(s, @"(?<!^)\s+(?=\b(?:Bar|Foo(?:Bar)?):)");
Console.WriteLine(string.Join("\n", res));

Output:

Foo: Some Text
Bar: Some Other Text
FooBar: Even More Text

Another idea: matching any word before a colon and all up to the next word with a : after:

var matches = Regex.Matches(s, @"\w+(?:-\w+)*:.*?(?=\s*(?:\w+(?:-\w+)*:|$))", RegexOptions.Singleline)
    .Cast<Match>()
    .Select(x => x.Value)
    .ToList();

See this regex demo.

Details

  • \w+(?:-\w+)*: - 1 or more word chars (letters/digits/underscores), and then 0 or more repetitions of - and 1+ word chars
  • .*? - any 0 or more chars, as few as possible
  • (?=\s+(?:\w+(?:-\w+)*:|$)) - up to the first occurrence of
    • \s* - 0 or more whitespaces
      • (?:\w+(?:-\w+)*: - either 1 or more word chars (letters/digits/underscores), and then 0 or more repetitions of - and 1+ word chars and then a colon
    • | - or
      • $ - end of string
    • )

See the C# demo.

Upvotes: 1

Related Questions