Dan Cook
Dan Cook

Reputation: 2072

Regex to split a string but keep delimiters, but not as separate elements

I need to split the following string

the quick brown fox jumps over the lazy dog

into the following tokens:

  1. the
  2. quick brown fox jumps over the
  3. lazy dog

So to explain, I want to split on the but include the the delimiter in the preceding array element (not as its own, separate element).

Can anyone shed any light on this or perhaps give me the correct regex?

I am using C#.

Upvotes: 3

Views: 2350

Answers (1)

Bernhard Barker
Bernhard Barker

Reputation: 55589

You need to use look-behind (?<=). The name says it all, look at the previous characters to see if they match some given pattern.

This should work:

"(?<=\\bthe) "

So, at any space, check if the previous characters were "the", if so, it matches.

Note - We also need to include the word boundary \\b (escaped \b) other-wise something like "bathe" will also match.

Without the look-behind, we'll check all the spaces:

   v     v     v   v     v    v   v    v
the quick brown fox jumps over the lazy dog

With the look-behind, we'll only match those the have "the" before it: (ignoring the \\b for now)

"the " - just found a space, and last characters are "the", so match.
"quick " - just found another space, but last characters are "...k", so no match.
etc.

Test.

Upvotes: 4

Related Questions