Reputation: 4868
I am trying to split()
a string that has words and numbers, but separated with multiple variants of whitespaces, such as <tab>
and <newline>
characters.
I was attempting this by normalising all whitespaces into single spaces and then split() the result into a string array in VBA.
I've found a similar question on this, but the answer is a routine to normalise multiple spaces only into a single space. I need a variant that can find whitespaces (ASCII chars below 32).
I tried running this routine and I get Error 13 error...
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
End Function
Here is the string I am trying to split()
.
<tab>6219<tab><nl>
Changes to Pilot Ops Dashboard<nl>
Daniel Son<tab>Gatehouse<tab>Port of Collinsville<nl>
Medium<nl>
Support<tab>waiting<tab><nl>
30 minutes ago<nl>
8 seconds ago<nl>
5 minutes<nl>
What I am after is a string array of the following (each line is an element in the array):
6219
Changes to Pilot Ops Dashboard
Daniel Son
Gatehouse
Port of Collinsville
Medium
Support
waiting
30 minutes ago
8 seconds ago
5 minutes
Upvotes: 1
Views: 849
Reputation: 75850
You could try the following pattern and function:
Function SplitRE(s As String) As String()
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:^[\t\n]+|[\t\n]*(?!.)|[\t\n]*([\t\n]))"
SplitRE = Split(Replace(.Replace(s, "$1"), Chr(9), Chr(10)), Chr(10))
End With
End Function
The pattern (?:^[\t\n]+|[\t\n]*(?!.)|[\t\n]*([\t\n]))
would match:
(?:
open non-capture group;
^[\t\n]+
- 1+ Tab- or Newline characters after star-line anchor. This is to remove these leading characters since unfortunately \A
is not available in VBA;|
- Or;[\t\n]*(?!.)
- 0+ Tab- or Newline characters with a negative lookahead to assert position is not followed by another character. This is to remove these trailing characters since unfortunately \Z
is not available to us in VBA;|
- Or;[\t\n]*([\t\n])
- 0+ Tab- or Newline characters with a capture group to hold the last of them.We then replace all matches with the content of the capture group, replace any remaining tab characters to newlines before we split on newline characters.
The above may be a bit verbose but this would prevent empty elements in your array. Example on how to invoke:
Sub test()
Dim str As String: str = Replace(Replace("<tab>6219<tab><nl>Changes to Pilot Ops Dashboard<nl>" & _
"Daniel Son<tab>Gatehouse<tab>Port of Collinsville<nl>Medium<nl>Support<tab>waiting" & _
"<tab><nl>30 minutes ago<nl>8 seconds ago<nl>5 minutes<nl>", "<tab>", Chr(9)), "<nl>", Chr(10))
arr = SplitRE(str)
End Sub
Don't mind the Replace()
functions, I just had to re-create your input string. The above would lead to following array:
Upvotes: 2