Reputation: 7707
I have a string "word1 word2 word3 word4 word5"
I would like to Split that into an array of: "word1 word2" | "word2 word3" | "word3 word4" | "word4 word5"
I can do it using a .NET split and loop, but I'd rather do it with a regex using Regex.Split
Here's the working split and loop:
Dim keywordPairArr As String() = Regex.Split(Trim(keywords), "[ ]")
For i As Integer = 0 To keywordPairArr.Length - 2
Dim keyword As String = keywordPairArr(i) & " " & keywordPairArr(i + 1)
If Not keywordDictionary.ContainsKey(keyword) Then
keywordDictionary.Add(keyword, Regex.Matches(keywords, "[" & keyword & "]+").Count)
End If
Next
Bonus: Every N-th word would be nice. N=3 would output "word1 word2 word3" | "word2 word3 word4" | "word3 word4 word5"
Any help on the Regex for spliting the string by every Nth [ ]?
Upvotes: 1
Views: 559
Reputation: 3476
You can use Regex.Matches()
for this task.
Here's a C# example that will output the result:
void PrintWordGroups( string input, string pattern )
{
MatchCollection mc = Regex.Matches( input.Trim(), pattern );
foreach ( Match m in mc )
{
Trace.WriteLine( m.ToString() );
}
}
void PrintGroupsOf2( string input )
{
PrintWordGroups( input, @"([^\s]+\s+[^\s]+)\s*" );
}
void PrintGroupsOf3( string input )
{
PrintWordGroups( input, @"(([^\s]+\s+){2}[^\s]+)\s*" );
}
void PrintGroupsOfN( string input, int n )
{
string pattern = string.Format( @"(([^\s]+\s+){{{0}}}[^\s]+)\s*", n - 1 );
PrintWordGroups( input, pattern );
}
Assumptions:
Patterns Explained:
([^\s]+\s+[^\s]+)\s*
- capture word->whitespace->word->optional whitespace (optional because the last expression won't have it due to the Trim()
operation in PrintWordGroups()
).([^\s]+\s+){2}
means: capture word->whitespace twice then finish with another word and then the optional whitespace.string.Format( @"(([^\s]+\s+){{{0}}}[^\s]+)\s*", n - 1 )
(([^\s]+\s+){5}[^\s]+)\s*
.Upvotes: 2