Regex: finding words that end with the same letter the next word begins with

Question

I tried to get regex to work but couldn't (probably because i'm fairly new to regex).

Here's what i want to do:

Consider this text: One word, duel. Limes said bye.

Wanted matches: One word, duel. Limes said bye.

As mentioned previously in the title, i want to get consecutive words matched, one ending with (for example) with "t" and the other one starting with "t" as well, case insensitive.

The closest i got to the answer is with this expression [^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]+\2[a-z]*[^a-z]

Wiktor Stribiżew · Accepted Answer

You may use

(?i)\b(?\p{L}+)(?:\P{L}+(?(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b

See the regex demo. The results are in Group "w" capture collection.

Details

\b - a word boundary
(?\p{L}+) - Group "w" (word): 1 or more BMP Unicode letters
(?:\P{L}+(?(\p{L})(?<=\1\P{L}+\1)\p{L}*))+ - 1 or more repetitions of
- \P{L}+ - 1 or more chars other than BMP Unicode letters
- (?(\p{L})(?<=\1\P{L}+\1)\p{L}*) - Group "w":
  - (\p{L}) - a letter captured into Group 1
  - (?<=\1\P{L}+\1) - immediately to the left of the current position, there must be the same letter as captured in Group 1, 1+ chars other than letters, and the letter in Group 1
  - \p{L}* - 0 or more letters
\b - a word boundary.

C# code demo:

var text = "One word, duel. Limes said bye.";
var pattern = @"\b(?\p{L}+)(?:\P{L}+(?(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b";
var result = Regex.Match(text, pattern, RegexOptions.IgnoreCase)?.Groups["w"].Captures
        .Cast()
        .Select(x => x.Value);
Console.WriteLine(string.Join(", ", result)); // => word, duel, Limes, said

A C# demo version without using LINQ:

string text = "One word, duel. Limes said bye.";
string pattern = @"\b(?\p{L}+)(?:\P{L}+(?(\p{L})(?<=\1\P{L}+\1)\p{L}*))+\b";
Match result = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
List output = new List();
if (result.Success) 
{
    foreach (Capture c in result.Groups["w"].Captures)
        output.Add(c.Value);
}
Console.WriteLine(string.Join(", ", output));

Regex: finding words that end with the same letter the next word begins with

Answers (2)

Related Questions