Fandango68
Fandango68

Reputation: 4868

How to split words in a string, with whitespaces using VBA?

I am trying to split() a string that has words and numbers, but separated with multiple variants of whitespaces, such as <tab> and <newline> characters.

I was attempting this by normalising all whitespaces into single spaces and then split() the result into a string array in VBA.

I've found a similar question on this, but the answer is a routine to normalise multiple spaces only into a single space. I need a variant that can find whitespaces (ASCII chars below 32).

I tried running this routine and I get Error 13 error...

Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
    Static re As Object

    If re Is Nothing Then
        Set re = CreateObject("VBScript.RegExp")
        re.Global = True
        re.MultiLine = True
    End If

    re.IgnoreCase = IgnoreCase
    re.Pattern = Pattern
    SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
End Function

Here is the string I am trying to split().

<tab>6219<tab><nl>
Changes to Pilot Ops Dashboard<nl>
Daniel Son<tab>Gatehouse<tab>Port of Collinsville<nl>
Medium<nl>
Support<tab>waiting<tab><nl>
30 minutes ago<nl>
8 seconds ago<nl>
5 minutes<nl>

What I am after is a string array of the following (each line is an element in the array):

6219
Changes to Pilot Ops Dashboard
Daniel Son
Gatehouse
Port of Collinsville
Medium
Support
waiting
30 minutes ago
8 seconds ago
5 minutes

Upvotes: 1

Views: 849

Answers (1)

JvdV
JvdV

Reputation: 75850

You could try the following pattern and function:

Function SplitRE(s As String) As String()

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "(?:^[\t\n]+|[\t\n]*(?!.)|[\t\n]*([\t\n]))"
    SplitRE = Split(Replace(.Replace(s, "$1"), Chr(9), Chr(10)), Chr(10))
End With

End Function

The pattern (?:^[\t\n]+|[\t\n]*(?!.)|[\t\n]*([\t\n])) would match:

  • (?: open non-capture group;
    • ^[\t\n]+ - 1+ Tab- or Newline characters after star-line anchor. This is to remove these leading characters since unfortunately \A is not available in VBA;
    • | - Or;
    • [\t\n]*(?!.) - 0+ Tab- or Newline characters with a negative lookahead to assert position is not followed by another character. This is to remove these trailing characters since unfortunately \Z is not available to us in VBA;
    • | - Or;
    • [\t\n]*([\t\n]) - 0+ Tab- or Newline characters with a capture group to hold the last of them.

We then replace all matches with the content of the capture group, replace any remaining tab characters to newlines before we split on newline characters.

The above may be a bit verbose but this would prevent empty elements in your array. Example on how to invoke:

Sub test()

Dim str As String: str = Replace(Replace("<tab>6219<tab><nl>Changes to Pilot Ops Dashboard<nl>" & _
    "Daniel Son<tab>Gatehouse<tab>Port of Collinsville<nl>Medium<nl>Support<tab>waiting" & _
    "<tab><nl>30 minutes ago<nl>8 seconds ago<nl>5 minutes<nl>", "<tab>", Chr(9)), "<nl>", Chr(10))
    
arr = SplitRE(str)

End Sub

Don't mind the Replace() functions, I just had to re-create your input string. The above would lead to following array:

enter image description here

Upvotes: 2

Related Questions